Larrabee delayed to 2011 ?

Not necessarily... I maintain that the most important use of tesselation is to *remove* triangles that would otherwise be there without it. If you're rendering too many more than a couple million or so triangles you're not doing your LOD/occlusion culling well enough (at least for today's screen resolutions) :) The goal is for triangles that are 4-8ish pixels.
I like that view of tesselation.

As for triangle size, 4-8 pixels would be a decent lower bound for triangles viewed head on, but even if your tesselator aims for that, on the screen 20% will be < 3 pixels due to obliqueness, 40% will be backfacing, and 20% will be off screen (and that's for the good devs that implement decent frustum culling, and looking at the lack of tiling on XBox 360 leads me to think most don't go beyond really coarse culling).

I'm probably mistaken, but didn't he (Tom Forsyth, in the talk) mention that setup (or at least the part of it that relates to tri/clk--say hi to my ignorance, it's shy) wasn't an issue? I remember him saying something like 500M+ tri/s being more than enough. Whether that was for Larrabee or GPUs in general, and whether that takes into account tessellation, I don't remember.
He's not looking at it the right way.

If you have a scene with 2 million polygons inside the view frustum, you have to render your cascaded shadow maps (some geometry gets sent to the GPU in two cascades) and lots of off-screen geometry. That's maybe 3 million polys, thus taking 6ms at 500Mtris/s. 20% of the triangles are clipped, half of the rest are backfacing, another 20% only contribute to 10% of the total pixel count load, and maybe 100k triangles go towards each face of a 512x512 cube map. So that's another 0.4+0.8+0.4+0.6=2.2M triangles, taking another 4.4 ms.

So for 60fps, you're now left with 6.2ms - only 37% of the frame time - to render 90% of the on-screen pixels. Only 800k triangles are frontfacing and on screen, and after overdraw only 300k are visible. You render at a 1920x1200, so the average is 8 pix/tri.

500Mtri/sec is more than enough? I don't think so...

(Personally, I care about lighting far more than polygon count, but triangles are still useful...)
 
Tesselation factors should really be calculated based on a screenspace view dependent metric, not distance.
 
Indeed - it's not that hard to do decent silhouette tesselation (I think even the DXSDK has an example).
That's fine, but you still get a lot of very oblique triangles.

Also, you don't want to get caught in the mentality that tesselation is only meant to round edges in silhouettes, because ultimately you want to use it with displacement mapping for seamless LOD. I suppose it's possible to forego tesselation of non-silhouette triangles by using parallax occlusion mapping, but there's a lot of tradeoffs going on there.
 
Ignoring curvature and displacement for a moment ... if the tesselation factor has a constant ratio with edge length in screenspace why would you get a lot of very oblique triangles?
 
Ignoring curvature and displacement for a moment ... if the tesselation factor has a constant ratio with edge length in screenspace why would you get a lot of very oblique triangles?
I'm not sure if I know what you're asking. You always get a lot of oblique triangles. A sphere, for example could be made of triangles with an edge length of 4 in screen space, so the center polys will have ~8 pixels, but half of the sphere's visible triangles will be 0-4 pixels due to their normal being over 60 degrees away from the eye vector.

(As an aside, how is a constant ratio with edge length in screen space significantly different from edge length / distance in world space?)
 
I'm not sure if I know what you're asking.
Well obviously, I said ignoring curvature (ie. flat patches) and you pick something with the highest curvature imaginable :) Better to cut up surfaces such that the curvature stays a little more limited whenever possible.
(As an aside, how is a constant ratio with edge length in screen space significantly different from edge length / distance in world space?)
An oblique patch will not get a large tesselation factor on the edges which are small in perspective.
 
An oblique patch will not get a large tesselation factor on the edges which are small in perspective.
I'd think that would lead to artefacts where the contour visibly changes as a model rotates, ie details popping in and out as polys are inserted/removed...
 
Other than the hierarchical ringbusses with snoop directories (YUCK) what would have changed?
Texturing system?

I got the distinct feeling from the original paper that it's where the most serious risks lie for basic graphics performance.

Jawed
 
Here's a thought:
In order to amortize control overhead, the ratio of front-end silicon to back-end vector units will be changed.
To provide more deterministic latency, the texturing blocks will be broken down into smaller groups and each would be linked with a core.

Each Larrabee core will have its VPU capacity increased to 5 16-wide units.
To save on silicon costs, only one of the vector SIMDs will be capable of the more expensive range of operations, while the other four will cover a subset of the most common ones.
;)
 
Here's a thought:
In order to amortize control overhead, the ratio of front-end silicon to back-end vector units will be changed.
To provide more deterministic latency, the texturing blocks will be broken down into smaller groups and each would be linked with a core.
:smile:

Sounds pretty reasonable.

Each Larrabee core will have its VPU capacity increased to 5 16-wide units.
To save on silicon costs, only one of the vector SIMDs will be capable of the more expensive range of operations, while the other four will cover a subset of the most common ones.
;)
DP:SP will go from 1:2 to 1:5

To save communication overhead, the cache hierarchy will become semi-coherent, by making the ring buses will be mutually incoherent while maintaining intra-ring coherency.

There will be a tessellator, tri setup and rasterizer in each core to speed up dedicated operations.

To reduce x86 overhead, it will be replaced by an ARM cortex A8.

To reduce area dedicated to cache per core, it will be replaced by a dedicated LS mapped to to a static address in overall mem space.
 
Here's a thought:
In order to amortize control overhead, the ratio of front-end silicon to back-end vector units will be changed.
To provide more deterministic latency, the texturing blocks will be broken down into smaller groups and each would be linked with a core.

Each Larrabee core will have its VPU capacity increased to 5 16-wide units.
To save on silicon costs, only one of the vector SIMDs will be capable of the more expensive range of operations, while the other four will cover a subset of the most common ones.
;)

I c wat u did thar:p I also slightly punched myself when the solution to x86 "overhead" was using ARM Cortex A8 (rpg's, not yours)...
 
Larrabee is supposed to bin all the triangles before rendering, right? How would that work with tessellation? Is there a thread that discusses this?
 
Larrabee is supposed to bin all the triangles before rendering, right? How would that work with tessellation? Is there a thread that discusses this?

I'd love to stand corrected, but I live under the impression that the LRB driver decides whether and where it'll act like an immediate mode or deferred renderer :?:
 
Larrabee is supposed to bin all the triangles before rendering, right? How would that work with tessellation? Is there a thread that discusses this?

I don't think they ever disclosed details of their dx11 pipeline implementation.
 
Larrabee is supposed to bin all the triangles before rendering, right? How would that work with tessellation? Is there a thread that discusses this?
Of course if intel want they'll use 512-bit LNI Vector Unit as a tessellator by software
 
Btw, is LRB1 really 100% x86 (x64?) compliant that could theoretically run any windows program, or is it just a feature/cost-reduced architecture that alledgedly originates out of the old pentium core?
 
Back
Top