Larrabee at Siggraph

Maybe it's just a developer preview? That'd make sense given Larrabee alleged focus on "non-traditioal" ways, both in programming and image-quality-features.

It would do Intel no good to have fully fledged hardware out the door but no single piece of software available showing some of Larrabees strengths and benefits.
 
More Siggraph news: ID will be presenting their "sparse voxel octtree raycasting stuff" at the session "Beyond Programmable Shading: In Action":
http://ompf.org/forum/viewtopic.php?f=3&p=8319

Lots of interesting bits from Jon Olick, including:
Thats a tough question. CUDA has some interesting benefits, but lacks a general caching architecture. That significantly hurts raycasting. Larrabee on the other hand has generic CPUs with generic caches, so probably larrabee on that one. However its impossible to tell without real hardware in hand.
 
Not sure why they would not want to just use cached texture fetches instead of direct un-cached access to global memory, but I'm sure that will be answered soon...
 
Not sure why they would not want to just use cached texture fetches instead of direct un-cached access to global memory, but I'm sure that will be answered soon...
Yeah that's not clear... maybe R/W hazards between cores? Not sure what they'd be writing at that stage, but I guess we'll find out soon enough :)
 
Acceleration data structures want to be cached, shared, reused with low latency and without consuming significant external memory bandwidth. Leaf node data, however, is much more like conventional texturing, where you expect to go all the way out to memory and need enough fetches in flight to cover that latency.
 
Or in other words ... locality is trash, gobs of cache is the only way to make it work ;) (I assume they do use cached texture reads, but since the rays near edges will traverse such wildly different parts of the data you are going to be trashing the cache, this is nothing like normal texture accesses.)
 
Last edited by a moderator:
I'm only talking about primary rays, where locality is very much like conventional texturing with a rasterizer. Secondary rays have huge locality issues, and AFAIK, are nowhere near a solved problem for real-time graphics hardware architectures.
 
You'll have to elaborate. Your argument is concise, but not convincing.

My flippant response would have been, "Yes, it is.", but that isn't enlightening.

Whether a frame is rendered by tracing primary rays or by rasterizing, the final shaded pixels should look the same, and the shading process should reference the same texels. Rasterization is object order while tracing is image order, so you get some differences in locality there, but in a way that is *better* for tracing.

Educate me. Don't just tell me I'm wrong. :)
 
Pixels which don't share an object will generally not share a texture either. If we assume for a moment iD's unique texturing then when you are done with an object you are done with the texture. The hardware can do this without trying to render other objects in between with object order rendering, with image order rendering you can't make that guarantee. In what way is that better?
 
If we assume for a moment iD's unique texturing then when you are done with an object you are done with the texture. The hardware can do this without trying to render other objects in between with object order rendering, with image order rendering you can't make that guarantee.
Well it's not a guarantee, but you can arrange and run your rays in such a way that you expect them to be relatively cache-coherent manner as long as you don't split the rays across a data structure boundary (i.e. often via incoherent control flow on the GPU). Certainly GPUs can traverse kd-trees pretty fast for primary rays and as long as those trees don't get too dense near the leaves. Yes, this starts to sound a lot like rasterization, but indeed it can be made to be efficient and cache-coherent in similar situations as rasterization, although much less explicitly.

That said, that's the real advantage of voxels... to avoid destroying cache coherency and the like you need to be able to stop traversing once you reach a suitable amount of divergence (i.e. LOD). This is also necessary to avoid aliasing. Voxels allow this to be done very naturally and cleanly, although they have other problems. That said, if it can be done quickly enough it may make sense to build voxel-like data structures on the fly from animated polygons just for LOD, since building progressive meshes or other topologically constrained structures on the fly is probably going to be too expensive for the immediate future.
 
So instead of Larrabee with 16-24 cores on 45nm in 2009, it'll be 32 cores, perhaps on 32nm, in Q1 or Q2 2010 which makes sense.


Larrabee should go beyond Direct3D 11 in some ways, but, will it support all of Direct3D 11 features in hardware?

In the same sense that Xenos goes well beyond Direct3D 9 / Shader Model 3.0, even goes beyond Direct3D 10 / Shader Model 4.0 in a few areas, yet, is not all the way upto D3D 10 Shader 4.0 in other areas. Xenos is usually concidered halfway or more than halfway between Direct3D 9 and Direct3D 10.

So Larrabee is supposed to consist of, 32 modified P54C cores running at 2-3 GHz (more or less), L2 cache(s), texture sampling hardware, perhaps some other rasterisation bits, and supposedly a 1024-bit external memory bus. What am I missing ?
 
All defined operations are natively-supported and accelerated

native meaning on the GPU (a multi-core x86 with extensions) rather than the CPU (a multi-core x86 without extensions) and accelerated meaning running at the native clock-speed of the multi-core x86 with extensions rather than the other multi-core x86 which isn't extended? (regardless of which is actually quicker?).

In this context, which bits of DX10 and/or DX11 do G200/RV770 support in hardware and which do they emulate in software?
 
native meaning on the GPU (a multi-core x86 with extensions) rather than the CPU (a multi-core x86 without extensions)

Yes and no. Yes in that you're correct overall, no in that your description is not quite fair to Larrabee. Larrabee has fixed-function, dedicated hardware for many rasterization tasks, hardware that no CPU has (or will have, until Fusion at least).

and accelerated meaning running at the native clock-speed of the multi-core x86 with extensions rather than the other multi-core x86 which isn't extended? (regardless of which is actually quicker?).

Accelerated meaning benefitting from the fixed-function hardware (or extensions, depending upon case) rather than having to be translated to ISA-native instructions by the front-end of the microprocessor.

In this context, which bits of DX10 and/or DX11 do G200/RV770 support in hardware and which do they emulate in software?

I don't have an answer to that question. I'm not an insider, just an enthusiast and a wannabe engineer :p
 
Yes and no. Yes in that you're correct overall, no in that your description is not quite fair to Larrabee. Larrabee has fixed-function, dedicated hardware for many rasterization tasks, hardware that no CPU has (or will have, until Fusion at least).

Well fixed-function is what I meant by "with extensions", I wasn't really meaning SSEx in that sense.

Accelerated meaning benefitting from the fixed-function hardware
OK, but traditional GPUs are progressively moving away from fixed-function for most or all of the higher-order functionality it seems to me. Does DX11 bring stuff to the table which requires/benefits from additional fixed-function hardware not already available? The statements I've read from MS suggest not.

That's what confused me about MD1988's question, it seemed somewhat anachronistic in terms of what's fixed-function, programmable and what's hardware, software.
 
Back
Top