Larrabee: Samples in Late 08, Products in 2H09/1H10

Larrabee is primarily a rasterization-based solution. I'm not sure where all these raytracing rumors come from. Sure, the hardware could do raytracing (the hardware *is* general-purpose after all), but that isn't what Larrabee will mostly be doing.
They'll likely sell it as a GPU to cover R&D costs, but in my opinion that's not Larrabee's biggest potential. I agree with others here though that it will take a 32 nm Larrabee II to have an interesting rasterization solution ánd see what that other potential is.

By the way, I'm not sure if anyone pointed out yet how easy it might be to scale Larrabee. Intel could keep using the same architecture and manufacture it on 32 nm as soon as the fabs are operational, and the same is true for 22 and 16 nm. GPU's on the other hand are redesigned from scratch every major generation and always a step behind on silicon technology.

Another advantage of keeping the same ISA is that the same software can be used and kept evolving. RAM bandwidth, embedded L3 ZRAM cache, issue width, thread count, etc. it all just becomes a parameter in the driver/compiler software. And the software can even adapt itself to the application behavior.
 
By the way, I'm not sure if anyone pointed out yet how easy it might be to scale Larrabee. Intel could keep using the same architecture and manufacture it on 32 nm as soon as the fabs are operational, and the same is true for 22 and 16 nm. GPU's on the other hand are redesigned from scratch every major generation and always a step behind on silicon technology.
I have no idea what you're talking about. Are you implying the following chips are all-new architectures and have major differences in their RTL: NV15, G71, G92, etc.? RV370, RV610, etc.?

Larrabee's advantage here is exactly zero. Sometimes GPU manufacturers decide their first new major product on a new process (full node, not half node) will be a new architecture, but that hasn't been the case since NV30 on NV's side and since R520 on ATI's side. Of course, Intel does have the advantage of always being ahead of TSMC.

They can't afford to have Larrabee 32nm going online much after the 32nm shrink of Nehalem though, because my current expectation is we'll see 32nm GPUs in 4Q10 or 1Q11. And TSMC's 45nm process has an obvious density advantage against Intel's (which probably doesn't do more than compensate Intel's speed advantage though). Of course, that's all based on preliminary TSMC roadmaps and things could change.
 
I'm not going to argue for or against that here, but the fact of the matter is it's completely unrelated to Voltron's claim. Why? Because the cost difference happens on the income statement *before* operating expenses, so it excludes R&D!

I am not so sure about that. Operating expenses appear on the income statement. Capital expenses do not. The idea is that you park the capital expense on the balance sheet and reduce it as an expense over time. The yearly portion of amortization must be an operating cost on the income statement. R&D is considered a capital expense and therefore will be placed on the balance sheet. Eventually though, it will be zeroed out by charging it on the income statement. I am not to sure how much latitude they have in the schedule though.
 
As for ArchitectureProfessor's claim that CPUs and GPUs sell at the same price; remember that discussion of wafer prices we had? Either way, G80 (which is a 480mm2 chip on 90nm) sold for an average ASP of $125 or so...

The interplay between cost, price, and profit margin make this sort of thing hard to analyze (as you're well aware). Yet, from what you've said, a mid-range GPU chip is cheaper than a mid-range CPU.

So, I wonder what accounts for the difference?

The GPU is a 90nm, whereas the CPU was a 65nm. Generally, 65nm parts are more expensive. The quad-core CPUs I quoted were actually two chips glued together in the same package. One would assume that you could still get a dual-core version of the same chip for cheaper than $280 (which was the cost of the quad-core). Perhaps customers are just used to paying more for CPUs than "add on" GPUs?

A few years ago (before the whole GPGPU thing), Intel say GPUs as a non-threat and a lower-profit margin business. Their opinion was that burning high-end fab capacity on GPUs would make them less money than building more CPUs. With the advent of Larrabee, that thinking is clearly changing inside of Intel...
 
I don't know all the details on the internal architecture of G80, but I would be surprised if its actually a SIMD pipe and not really 16 independent pipes, each with register offsets but a single instruction sequencer...
What are the effects of the distinction you're making here. I don't understand the consequences of the two approaches you've described.

Jawed
 
In fact, ironically, when you do some of these things with rasterization the used techniques start to show some striking resemblance to raytracing...

I think that is a key point. I'm not a graphics expert, but from the reading I've done (mostly since starting to post on this thread), it seems that rasterization and raytracing have different strengths and weaknesses, and---as you said---advanced techniques of both start to show some striking resemblances. Who knows, perhaps in a decade or so, this raytracing vs rasterization duality won't really exist.

One more question for you all. I know the early RenderMan tools used by Pixar and others used rasterization. Is that still the case? Have they moved more toward raytracing at this point? If the off-line computer animated movie production is still using rasterization, why would GPUs go away from that?
 
R&D is considered a capital expense and therefore will be placed on the balance sheet.
Nope, that is NOT correct in the US. Quick Google gives me this document with this quote:
In fact, a reasonable argument can be made that research and development expenses (R&D) are more long term than investments in physical plant and equipment at many firms, especially those in the pharmaceutical and high technology sectors. Thus, it follows that R&D expenses should be treated as capital expenditures. In reality, however, accounting standards in the United States require the treatment of R&D as operating expenses.
The rest of the document gives some ways to try to minimize the impact of that, but AFAIK the kind of R&D Aaron spoke of is 100% classified as 'Operating Expenses' at both NVIDIA and Intel.
 
By the way, I'm not sure if anyone pointed out yet how easy it might be to scale Larrabee. Intel could keep using the same architecture and manufacture it on 32 nm as soon as the fabs are operational, and the same is true for 22 and 16 nm. GPU's on the other hand are redesigned from scratch every major generation and always a step behind on silicon technology.
Isn't the definition of a "major generation" that something significant was changed?

If there are more major generations for GPUs, and it's not clear that there are significantly more than for CPUs, it's because their internal design has been kept pretty well hidden until recently.

There is no tie between the newness of the designs and their process nodes.
GPUs are behind because they don't use Intel's fabs, which is pretty much the same problem every other silicon product is going to face.
Intel's tick-tock strategy seems counter to your perceived trend as well.

It's also not the case that CPUs don't go through some significant design work after optical shrinks. Even a dumb shrink at any node past 90nm requires a reworking of the circuitry, even if the higher-level design is unchanged.
Failure to do so has lead to problems.

The work of implementing a design has increased to the point that Intel's CPUs are going through major architectural changes every 2 years, which doesn't leave much room for GPUs to seem all that excessive.

Another advantage of keeping the same ISA is that the same software can be used and kept evolving. RAM bandwidth, embedded L3 ZRAM cache, issue width, thread count, etc. it all just becomes a parameter in the driver/compiler software. And the software can even adapt itself to the application behavior.
Not keeping the same ISA hasn't stopped GPUs from rapidly evolving.
The key point is that they didn't expose the ISA to developers directly, though CTM and CUDA bypass the API and might slow design evolution from this point forward.

Since GPUs rely on driver compilers, the need for backwards compatibility in consumer graphics is reduced. Larrabee in this regard is nothing new.
 
If Intel talks public about 3D graphics they talk about raytraceing and how much better it is compared to rasterization. This make people believe that their upcoming hardware is build to do raytraceing and not rasterization.

There are some "visionaries" at Intel that just like to blow smoke about why consumers will need faster processors. So, some people go around saying... "So what will take up lots and lots of CPU cycles.... I know, raytracing! Yea, that's the ticket. Highly parallel (good for our multicores) yet not that easy to do on a GPU. Perfect."

I personally haven't seen the same sort of hype coming from people closely involved with the visual computing group. I've heard some talk about some of the more advanced lighting models (which take some ray-tracing like methods) could be done on Larrabee, but it is very much along game physics in the "and other cools things with Larrabee" bin rather than "this is what we're betting the farm on".

As far as I know Larrabee is targeted at the same rasterization D3D/OpenGL pipeline that all the other GPUs are. It is likely that Larrabee might do some more advanced culling, sorting, grouping for texture and frame buffer locality, or adaptive anti-aliasing based on edge detection (all just speculation on my part). Such algorithms are more irregular than current GPU algorithms (and thus a better fit for Larrabeee), but it is still rasterization.
 
One more question for you all. I know the early RenderMan tools used by Pixar and others used rasterization. Is that still the case? Have they moved more toward raytracing at this point? If the off-line computer animated movie production is still using rasterization, why would GPUs go away from that?

They only used raytracing when they need it. Go to http://graphics.pixar.com/ there are a few articles that discuss it. What they said is that the time to evaluate their shader is longer compare to the raytracing part.
 
As I said, it really doesn't make sense to run 16 in parallel. With the vector structure, it would make much more sense to do a 4 pixel quad and time division multiplex between 4 pixel quads based on external latencies.

Also in general, SIMD is ALU only, while loads and stores are done via normal loads and stores, ie, 1 at a time. This has been pretty standard since the inception of SIMD instructions into mainstream CPUs, so I don't see any reason why that would change. I pretty much have no knowledge of what they are planning as far as SIMD extensions for Larrabee.

I don't know all the details on the internal architecture of G80, but I would be surprised if its actually a SIMD pipe and not really 16 independent pipes, each with register offsets but a single instruction sequencer...

As for NV and ATI, I think you can extract a lot from the CUDA and CTM stuff, and for G80 looks to be a 8-way SIMD internally.

I had considered that Larrabee would try to do 4 shader programs 4-wide in 16-wide SIMD, but,

For one thing you would need lots of permutation instructions (complex non-x86 style ISA with a 16-wide vector, eats ALU slots which could be doing actual work), and (now especially with unified shaders) it is often that programs do a lot of work on scalers, pairs, and tuples as well. You would need to pair SOA and AOS in the same compiled code. I think ALU efficiency would suffer. Look at the history of doing AOS with current x86 SSE instructions, in most cases simply doing scaler ops is faster because of all the instructions needed to move values around negate the advantage of going parallel (however with the 2-wide DP this isn't always the case because SSE has separate hi/lo load/store, and a splat load as well).

Now if you decided to use the SSE regs as an extra 4 scaler program wide (SOA), then you would effectively have both a 4 vector and scaler path per shader program but there would be all sorts of complexity in dealing with passing register values between SSE regs and the 16-wide vector regs. So my guess is that this wouldn't happen (just too messy and limited from an ISA standpoint).

The basic concept is that it would look like to the programmer that you have "16 independent" scaler programs running in parallel, but actually be SIMD under the hood, with predicates handling blocking scaler slots in the SIMD vector when the programs diverge at a branch, and with texture fetch instructions that gather 16 independent locations when filling one of the physical 16-wide registers.

I still don't see any possible more efficient way to run DX10 style shader programs on Larrabee.
 
As for the ray tracing vs rendering thing, ray tracing looses its great scalability on dynamic geometry (need to rebuild the ray intersection acceleration structures).

Besides with NVidia owning MentalRay, I wouldn't be surprised if when ray tracing does become practical for real-time, that NVidia will have dedicated hardware to do this, which will probably easily out-perform a software ray tracer...
 
Now if you decided to use the SSE regs as an extra 4 scaler program wide (SOA), then you would effectively have both a 4 vector and scaler path per shader program but there would be all sorts of complexity in dealing with passing register values between SSE regs and the 16-wide vector regs.

From what I've heard, Larrabee doesn't even have SSE registers (just 64-bit scalars and 512-bit vectors). Strange, but likely true.

I still don't see any possible more efficient way to run DX10 style shader programs on Larrabee.

I agree. I think having 16 shader invocations running in parallel in SIMD/Vector style using vector masks and such might work really well for Larrabee. This is the mode of execution that would most resemble what the G80 currently does. In this case, the software driver for Larrabee that translates the shader programs would be responsible for creating the code the emulates the G80 SIMD style using Larrabee's vectors. Seems reasonable to me...

This might be fine for shaders. However, I still get the feeling there is some secret sauce in the Larrabee vector units that allows it to get away with less fixed-function hardware than a GPU. My impression was these special vector instructions were customized for graphics (in essence, fixed-function units operating on vectors with an instruction interface). I just don't know enough about the graphics pipeline to reverse engineer what they might be.
 
...I wouldn't be surprised if when ray tracing does become practical for real-time, that NVidia will have dedicated hardware to do this, which will probably easily out-perform a software ray tracer...

This is what I love about this thread. Fundamentally we're debating the role of special-purpose vs general-purpose hardware and which makes sense where. Of course, that is a moving target, but it is really fun to debate.

Even if rasterized-based 3D graphics will become dominated by the general-purpose aspects of the computation (which has been my position in this thread), maybe there will be an inflection point in which raytracing can be done, but only with special GPU hardware. In that case, the trend might swing back toward special-purpose hardware (whereas the trend right now is toward more general-purpose GPU hardware). It should be interesting to watch.
 
This might be fine for shaders. However, I still get the feeling there is some secret sauce in the Larrabee vector units that allows it to get away with less fixed-function hardware than a GPU. My impression was these special vector instructions were customized for graphics (in essence, fixed-function units operating on vectors with an instruction interface). I just don't know enough about the graphics pipeline to reverse engineer what they might be.

Maybe they're going that route because they have to design around the patents for specialized hardware that the GPU manufacturers already have in place.

No point in ramping up Larrabee just to have it smacked with an injunction for infringing on an Nvidia or ATI patent.
This would be orthogonal as to whether it's good for performance, however.
 
Bah! I was thinking about next NVIDIA raytracing monster based on CUDA 2 and MentalRay... the "RayForce"... but that name already exists and it's patented ( already in the raytracing industry ) !

http://www.rayforce.net/

Now they have to think about other name for the MentalRay-HW team! Mwahahah!
 
Maybe they're going that route because they have to design around the patents for specialized hardware that the GPU manufacturers already have in place.

Unlikely. From what I understand, Intel engineers are *forbidden* from even looking at patents. Why? If you're knowingly infringing, then it is 3x damages (part of the patent law statues). To avoid these 3x damages, they just disallow any of their engineers from looking at any patents. Clever, eh? I've never heard Intel make a technical design decision based around a patent issue.

Plus, Intel has such a large patent portfolio, I don't think NVIDIA would want to get in a pissing match with Intel on patents. The counter-suit would likely drag both of them down, and neither of them would be clear winners.

Most of the time, hardware patents are used either (1) by big companies to squash little companies or (2) or little companies without products (only patents) suing a bigger company. A variant of #2 is when a university is suing a big company. You just don't see much hardware patent litigation between big players (few IBM vs Intel vs AMD size battles). In such cases, only the lawyers win (well, and the academics that serve as expert witnesses, but I digress).
 
Unlikely. From what I understand, Intel engineers are *forbidden* from even looking at patents. Why? If you're knowingly infringing, then it is 3x damages (part of the patent law statues). To avoid these 3x damages, they just disallow any of their engineers from looking at any patents. Clever, eh?

That's really more or less standard industry practice.
 
Back
Top