Larrabee at Siggraph

CarstenS · Jul 23, 2008

Maybe it's just a developer preview? That'd make sense given Larrabee alleged focus on "non-traditioal" ways, both in programming and image-quality-features.

It would do Intel no good to have fully fledged hardware out the door but no single piece of software available showing some of Larrabees strengths and benefits.

3dilettante · Jul 23, 2008

So the rumors are basically in line with what Intel stated in a conference call.

http://www.beyond3d.com/content/news/565

By Intel's own statements, everything would appear to be on-track if we go by that time frame. Larrabee's schedule was a touch more optimistic prior to that point.

Rufus · Jul 24, 2008

More Siggraph news: ID will be presenting their "sparse voxel octtree raycasting stuff" at the session "Beyond Programmable Shading: In Action":
http://ompf.org/forum/viewtopic.php?f=3&p=8319

Lots of interesting bits from Jon Olick, including:

Thats a tough question. CUDA has some interesting benefits, but lacks a general caching architecture. That significantly hurts raycasting. Larrabee on the other hand has generic CPUs with generic caches, so probably larrabee on that one. However its impossible to tell without real hardware in hand.

TimothyFarrar · Jul 24, 2008

Not sure why they would not want to just use cached texture fetches instead of direct un-cached access to global memory, but I'm sure that will be answered soon...

Andrew Lauritzen · Jul 24, 2008

TimothyFarrar said:
Not sure why they would not want to just use cached texture fetches instead of direct un-cached access to global memory, but I'm sure that will be answered soon...

Yeah that's not clear... maybe R/W hazards between cores? Not sure what they'd be writing at that stage, but I guess we'll find out soon enough

RacingPHT · Jul 25, 2008

TimothyFarrar said:
Not sure why they would not want to just use cached texture fetches instead of direct un-cached access to global memory, but I'm sure that will be answered soon...

They want to edit the voxel on the fly?

cass · Jul 25, 2008

Acceleration data structures want to be cached, shared, reused with low latency and without consuming significant external memory bandwidth. Leaf node data, however, is much more like conventional texturing, where you expect to go all the way out to memory and need enough fetches in flight to cover that latency.

MfA · Jul 26, 2008

Or in other words ... locality is trash, gobs of cache is the only way to make it work

(I assume they do use cached texture reads, but since the rays near edges will traverse such wildly different parts of the data you are going to be trashing the cache, this is nothing like normal texture accesses.)

cass · Jul 26, 2008

I'm only talking about primary rays, where locality is very much like conventional texturing with a rasterizer. Secondary rays have huge locality issues, and AFAIK, are nowhere near a solved problem for real-time graphics hardware architectures.

MfA · Jul 26, 2008

cass said:
I'm only talking about primary rays, where locality is very much like conventional texturing with a rasterizer.

No, it isn't.

cass · Jul 26, 2008

You'll have to elaborate. Your argument is concise, but not convincing.

My flippant response would have been, "Yes, it is.", but that isn't enlightening.

Whether a frame is rendered by tracing primary rays or by rasterizing, the final shaded pixels should look the same, and the shading process should reference the same texels. Rasterization is object order while tracing is image order, so you get some differences in locality there, but in a way that is *better* for tracing.

Educate me. Don't just tell me I'm wrong.

MfA · Jul 26, 2008

Pixels which don't share an object will generally not share a texture either. If we assume for a moment iD's unique texturing then when you are done with an object you are done with the texture. The hardware can do this without trying to render other objects in between with object order rendering, with image order rendering you can't make that guarantee. In what way is that better?

Andrew Lauritzen · Jul 26, 2008

MfA said:
If we assume for a moment iD's unique texturing then when you are done with an object you are done with the texture. The hardware can do this without trying to render other objects in between with object order rendering, with image order rendering you can't make that guarantee.

Well it's not a guarantee, but you can arrange and run your rays in such a way that you expect them to be relatively cache-coherent manner as long as you don't split the rays across a data structure boundary (i.e. often via incoherent control flow on the GPU). Certainly GPUs can traverse kd-trees pretty fast for primary rays and as long as those trees don't get too dense near the leaves. Yes, this starts to sound a lot like rasterization, but indeed it can be made to be efficient and cache-coherent in similar situations as rasterization, although much less explicitly.

That said, that's the real advantage of voxels... to avoid destroying cache coherency and the like you need to be able to stop traversing once you reach a suitable amount of divergence (i.e. LOD). This is also necessary to avoid aliasing. Voxels allow this to be done very naturally and cleanly, although they have other problems. That said, if it can be done quickly enough it may make sense to build voxel-like data structures on the fly from animated polygons just for LOD, since building progressive meshes or other topologically constrained structures on the fly is probably going to be too expensive for the immediate future.

Megadrive1988 · Jul 26, 2008

So instead of Larrabee with 16-24 cores on 45nm in 2009, it'll be 32 cores, perhaps on 32nm, in Q1 or Q2 2010 which makes sense.

Larrabee should go beyond Direct3D 11 in some ways, but, will it support all of Direct3D 11 features in hardware?

In the same sense that Xenos goes well beyond Direct3D 9 / Shader Model 3.0, even goes beyond Direct3D 10 / Shader Model 4.0 in a few areas, yet, is not all the way upto D3D 10 Shader 4.0 in other areas. Xenos is usually concidered halfway or more than halfway between Direct3D 9 and Direct3D 10.

So Larrabee is supposed to consist of, 32 modified P54C cores running at 2-3 GHz (more or less), L2 cache(s), texture sampling hardware, perhaps some other rasterisation bits, and supposedly a 1024-bit external memory bus. What am I missing ?

ShaidarHaran · Jul 26, 2008

Megadrive1988 said:
So Larrabee is supposed to consist of, 32 modified P54C cores

No, no, and no again.

Damnit Jon Stokes, why did you start this rumor?

Larrabee's individual cores have no relation to P54c, or any other (relatively) early pipelined x86 core.

nutball · Jul 26, 2008

Megadrive1988 said:
Larrabee should go beyond Direct3D 11 in some ways, but, will it support all of Direct3D 11 features in hardware?

I'm confused by this question. What do you mean by support and what do you define as hardware [support]?

ShaidarHaran · Jul 26, 2008

nutball said:
I'm confused by this question. What do you mean by support and what do you define as hardware [support]?

All defined operations are natively-supported and accelerated (meaning no translation or emulation is necessary).

nutball · Jul 26, 2008

ShaidarHaran said:
All defined operations are natively-supported and accelerated

native meaning on the GPU (a multi-core x86 with extensions) rather than the CPU (a multi-core x86 without extensions) and accelerated meaning running at the native clock-speed of the multi-core x86 with extensions rather than the other multi-core x86 which isn't extended? (regardless of which is actually quicker?).

In this context, which bits of DX10 and/or DX11 do G200/RV770 support in hardware and which do they emulate in software?

ShaidarHaran · Jul 26, 2008

nutball said:
native meaning on the GPU (a multi-core x86 with extensions) rather than the CPU (a multi-core x86 without extensions)

Yes and no. Yes in that you're correct overall, no in that your description is not quite fair to Larrabee. Larrabee has fixed-function, dedicated hardware for many rasterization tasks, hardware that no CPU has (or will have, until Fusion at least).

nutball said:
and accelerated meaning running at the native clock-speed of the multi-core x86 with extensions rather than the other multi-core x86 which isn't extended? (regardless of which is actually quicker?).

Accelerated meaning benefitting from the fixed-function hardware (or extensions, depending upon case) rather than having to be translated to ISA-native instructions by the front-end of the microprocessor.

nutball said:
In this context, which bits of DX10 and/or DX11 do G200/RV770 support in hardware and which do they emulate in software?

I don't have an answer to that question. I'm not an insider, just an enthusiast and a wannabe engineer

nutball · Jul 26, 2008

ShaidarHaran said:
Yes and no. Yes in that you're correct overall, no in that your description is not quite fair to Larrabee. Larrabee has fixed-function, dedicated hardware for many rasterization tasks, hardware that no CPU has (or will have, until Fusion at least).

Well fixed-function is what I meant by "with extensions", I wasn't really meaning SSEx in that sense.

Accelerated meaning benefitting from the fixed-function hardware

OK, but traditional GPUs are progressively moving away from fixed-function for most or all of the higher-order functionality it seems to me. Does DX11 bring stuff to the table which requires/benefits from additional fixed-function hardware not already available? The statements I've read from MS suggest not.

That's what confused me about MD1988's question, it seemed somewhat anachronistic in terms of what's fixed-function, programmable and what's hardware, software.

Larrabee at Siggraph

CarstenS

Moderator

3dilettante

Rufus

TimothyFarrar

Andrew Lauritzen

Moderator

RacingPHT

cass

MfA

cass

MfA

cass

MfA

Andrew Lauritzen

Moderator

Megadrive1988

ShaidarHaran

hardware monkey

nutball

ShaidarHaran

hardware monkey

nutball

ShaidarHaran

hardware monkey

nutball

Similar threads