Actually i haven't heard about "Larrabee' in media for months.I was searching the web for any news about about larrabee found none... Still I found some stuffs that may interest you
Better multicore energy conservation on mobile device with virtualization
A programming model for heterogenous X86 platforms
Array building blocks: a flexible programming model for multicore and manycore architectures
Knights Ferry with 32nm is just 32 core @ 1.2Ghz with only 500GFlops DP, less than today's Tesla C2070 and Power 7 4 MCM with 1T DP
So how much do you believe 22nm can give, only by saying that LRBni is more easy to program?
There's quite a few papers out there on software micropoly rendering, and Intel's parallel setup was pretty fast, so I don't think it's the software rasterization that's the problem.
Instead, I think it's the way that tile based rendering needs either gobs of bandwidth with tessellation (binning all the polys and the dynamically generated vertex data), or lots of geometry workload duplication (tessellating each patch for the initial pass and again for every tile it gets binned into).
Tesselation is great way to amplify data yet avoid bandwidth consumption (assuming it's done right - Cayman's spilling into memory isn't needed with a proper architecture). However, that's only the case if you immediately render the triangles rather than defer it.
Hardly.True there're a few micro-polygon setup papers out there, but it's not for Larrabee, and IIRC non of them mentioned how to use the method efficiently for a tile based renderer. The closest thing I found find is like Reyes, which is sorting patches before tessellation into tiles. But how to compute a tight yet conservative bound for a patch in a graphics API without user's hint is an open question. DX11 is just designed for immediate mode GPUs.
I believe software is always smarter than hardware. If the hardware has an correct implementation, so does software.
The problem is software is rarly optimized for bandwidth alone. In the CPU world, bandwidth/compute ratio is almost always higher.
Let's say tessellate/displace a patch way more than twice is a dumb idea. So an implementation must always spill the vertices out to memory for neighbor tiles just like Reyes. That sounds dumb too.
Or let's say spill into memory is a dumb idea. Then even if no single patch overlapps tile boundarys, because of sorting, every patch has to be tessellated/displaced exactly twice, otherwise is unclear which tile it touches.
TBDR's idea relies on one assumption: pixel R/W bandwidth is larger than primitive parameter W/R bandwidth. As we all know that a pixel is much smaller than a vertex, if the vertex approaches pixel size, I can see no efficiency of TBDR. Maybe it's me that's dumb.
That's a common practice for software parallel renderer already IMO.Well, for a TBDR to win with highly tessellated geometry, it has to dump the compressed version of geometry (ie raw patches) after spatial binning to memory and tessellate them on chip. Off hand, I can't see any other way that TBDR will win.
Assuming it can be done efficiently by a clever hw/sw combination, a TBDR can certainly win, and win big with ginormous tesselation.
They say pretty clearly in the abstract itself that they can haandle arbitrary shaders and arbitrary displacement maps.But I'm highly suspicious about whether the paper descipts a proven technique, Expecially, can it handle aribitary real-world vertex shaders, with arbiraty displacment map funcion? The paper seems uncertain about that, too. Plus, the cost of analyzing/converting a displacement map to an interval texture in real-time within at most a fragment of a milisecond in driver seems highly impossible to me. Don't ignore the case that the displacement map can be procedurelly generated each frame as in water rendering and simulations.
Water might be an exception, but caching shaders should work fine in a large number of cases.Anyway the method described is quite novel, And IMO it should be quite good for an offline renderer, rather than of a very responsiveness program: graphics driver. DX11 provides no way for user to hint the graphics driver about this, thus I considered it as an immediate mode API. User(us graphics programmers) knows best about our data and shaders, not graphics drivers. Maybe OpenGL can be exented better, but in case of Larrabee, it has to be working well in DX11.
That's not correct. Knights Ferry is 45nm.
They say pretty clearly in the abstract itself that they can haandle arbitrary shaders and arbitrary displacement maps.
And just how realistic is that scenario?I'll be very curious how they get this conclusion.
For one thing, they assume "differentiable functions". What if a function is not differentiable? as simple as a step(x,y) in HLSL is not differentiable. And to an extremme, what about an integer vertex shader with bitwise logical instructions?
And just how realistic is that scenario?
That doesn't take away from my point. Tesselation hurts a TBDR more than an IMR. Including all the advantages of TBDR and inefficiencies of Larabee's generalized architechture, if it was barely competitive in DX9, it would be much less so in DX11.
Geometry will always cost more on a TBDR than an IMR. There's no way around that. So no, it can't win big with high tessellation unless it's the pixel workload that is giving it the advantage (or, naturally, if you only give the TBDR an optimation that is equally valid for the IMR).Assuming it can be done efficiently by a clever hw/sw combination, a TBDR can certainly win, and win big with ginormous tesselation.
They say pretty clearly in the abstract itself that they can haandle arbitrary shaders and arbitrary displacement maps.
True, but interval textures aren't free. And how about random value functions using bit masks? Procedural noise? It's an interesting paper, though.And just how realistic is that scenario?
Sure, it's not realistic, but it's possible. As an graphics driver, you should support all possibilities within specifications. You can't fail to compile a program, for example, just because it's not realistic or doesn't makes sense.
No offense, but what about the Taylor series of this function:
float InvSqrt (float x){
float xhalf = 0.5f*x;
int i = *(int*)&x;
i = 0x5f3759df - (i>>1);
x = *(float*)&i;
x = x*(1.5f - xhalf*x*x);
return x;
}
It's creativeness.
On the other hand, culling by developers themselves is much easier: everything is under control. If someone didn't make sense, you could just fire the guy.
Culling be devs helps a tbdr and an imr equally.
You can almost cull patches quite easily in hull shader. Just give it low tesselation factors, so that it's cheap to discard.Think about a crazy idea: By offering a new SV_TileRect to the hull shader, a user is able to cull the patch if the test fails. For TBDR, this value is the dimention of the current tile. For IMR, it's the viewport size.
You can almost cull patches quite easily in hull shader. Just give it low tesselation factors, so that it's cheap to discard.
This may also be of interest: Efficient Bounding of Displaced Bézier Patches which is a 2010 paper by some of the same authors.Thanks for the great paper!
As an intermediate step I believe it can indeed make sense to keep the IGP around for a while. Software is only slowly becoming more generic, and extending the vector processing capabilities of the CPU while retaining a minimal cost adequate IGP would be a really low-risk way to prepare for the future while not compromising legacy graphics.Nick, while augmenting the throughput of the CPU core (simple or complex one) don't you think it could be an interesting thing to keep a "tiny/lesser GPU"?
I mean you pointed earlier in the thread that some time ago the vertex were still being handled by the CPU on Intel platform. How about moving the pixel shading too (like Dice is doing on PS3)?
Basically you put a tiny GPU with by today standard a "fucked up" ALUs/Tex ratio (ie plenty of texturing power vs compute power), assuming most 3D engine are moving to more and more deferred techniques the gpu (a modern one) would act a "deferred renderer accelerator"/"render target filler", could that make sense?