Larrabee delayed to 2011 ?

3dilettante · May 19, 2010

Andrew Lauritzen said:
I'm not saying that at all... I'm say cost to produce is more relevant for comparing the efficiency of a processor than cost to consumer (which includes profit margins).

I suppose since neither Larrabee or at-cost Xeons and multi-socket boards are commercially procurable (unless you're building a top500 supercomputer), we can call this a draw.

The different hardware threads are just to cover various latencies, etc. They cannot all execute an instruction in the same clock. See the Larrabee architecture paper:

1/4 of the threads can execute a pair of instructions a clock, that is 64 instructions per clock on dual-issue cores. For the usage we are debating, this is more acheivable since there is so little peak issue width to waste.

For 8 i7 cores, some combination of threads can at most execute 32 in aggregate, maybe 40 if we get optimal macro-op fusion. This is less reachable and less relevant for a highly thread-parallel workload.

To consider the benefit of these "HW thread" implementations then, you need to consider the memory architecture, which is very different between the two. Sure Larrabee theoretically has twice as many hardware threads with which to hide latencies, but GPU memory latencies are typically far more than 2x longer than CPUs.

Larrabee's not sufficiently threaded to cover any significant amount of memory latency, and neither is Nehalem.
Larrabee's hardware prefetch capabilities may not have been developed, it was not disclosed. Given the penalty prefetching can impose on bandwidth-heavy workloads, I wouldn't expect it to be as aggressive as Nehalem.

In terms of memory accesses, Nehalem has 48 entries in its load buffer. This is an aggregate 384 outstanding loads. Larrabee is in-order and would not attempt to do this level of speculation in memory. Bare minimum, I'd expect at least one outstanding load per thread, for at least 128.

On the other hand, if Larrabee has a memory bus roughly equivalent to other GPUs, it would take between 2-4 Nehalems to match its bandwidth.

Again, I'm not really arguing with you per se, just pointing out that the real power of throughput architectures is the SIMD. They make trade-offs that make them much less impressive when running (even multi-threaded) scalar C code.

I've already made note of how horrid Larrabee would be on x86 code without vector instructions, in part because doing so leaves very little else it can run.
However, even inefficient usage of the VPU, like say predicating 3/4 of the lanes off, would still give it some interesting strengths.

edit:
Even 15/16 lanes off, if only because I don't know what other FP math support it would have.
Nehalem could do an FP MUL, ADD, and store per clock.
Larrabee can do one VPU op and a vector. store
If its an FMADD, then each Larrabee core has similar throughput, although in restricted circumstances.
If there is no dependent add, then it's either a FMUL and store, or FADD and store.
That's 2/3 the issue capability of a Nehalem core, and in the case of a dual-socket i7 that's with 4 times as many cores.

Andrew Lauritzen · May 20, 2010

3dilettante said:
However, even inefficient usage of the VPU, like say predicating 3/4 of the lanes off, would still give it some interesting strengths.

Sure, I'm just not sure that ray tracing with 1-4 lanes active with more cores but smaller caches per core is one of those strengths. Maybe it is though, we'll see.

3dilettante said:
That's 2/3 the issue capability of a Nehalem core, and in the case of a dual-socket i7 that's with 4 times as many cores.

Sure, but at a lower clock rate and requiring more parallelism to fill all of its hardware threads (which never scales perfectly). I agree it still has some advantages even with 1 lane, but it's in the same ball-park. With good VPU use is when it starts to look particularly impressive

Jawed · May 25, 2010

An Update On Our Graphics-related Programs

http://blogs.intel.com/technology/2010/05/an_update_on_our_graphics-rela.php

1. Our top priority continues to be around delivering an outstanding processor that addresses every day, general purpose computer needs and provides leadership visual computing experiences via processor graphics. We are further boosting funding and employee expertise here, and continue to champion the rapid shift to mobile wireless computing and HD video - we are laser-focused on these areas.

2. We are also executing on a business opportunity derived from the Larrabee program and Intel research in many-core chips. This server product line expansion is optimized for a broader range of highly parallel workloads in segments such as high performance computing. Intel VP Kirk Skaugen will provide an update on this next week at ISC 2010 in Germany.

3. We will not bring a discrete graphics product to market, at least in the short-term. As we said in December, we missed some key product milestones. Upon further assessment, and as mentioned above, we are focused on processor graphics, and we believe media/HD video and mobile computing are the most important areas to focus on moving forward.

4. We will also continue with ongoing Intel architecture-based graphics and HPC-related R&D and proof of concepts.

Jawed

rpg.314 · May 25, 2010

Jawed said:
http://blogs.intel.com/technology/2010/05/an_update_on_our_graphics-rela.php

Jawed

So,

1. Our top priority continues to be around delivering an outstanding processor that addresses every day, general purpose computer needs and provides leadership visual computing experiences via processor graphics. We are further boosting funding and employee expertise here, and continue to champion the rapid shift to mobile wireless computing and HD video - we are laser-focused on these areas.

More GMA crap.

2. We are also executing on a business opportunity derived from the Larrabee program and Intel research in many-core chips. This server product line expansion is optimized for a broader range of highly parallel workloads in segments such as high performance computing. Intel VP Kirk Skaugen will provide an update on this next week at ISC 2010 in Germany.

Hmm.., smells nice. The new server line could be something big. Wider SMT? LRB1 being sold as an on-socket accelerator?

3. We will not bring a discrete graphics product to market, at least in the short-term. As we said in December, we missed some key product milestones. Upon further assessment, and as mentioned above, we are focused on processor graphics, and we believe media/HD video and mobile computing are the most important areas to focus on moving forward.

No LRB this year for sure.

4. We will also continue with ongoing Intel architecture-based graphics and HPC-related R&D and proof of concepts.

More experimental stuff.

Love_In_Rio · May 25, 2010

Well, so i supposse all of the three future consoles will have an ATI card inside...

Alexko · May 25, 2010

Love_In_Rio said:
Well, so i supposse all of the three future consoles will have an ATI card inside...

Why? Next-gen consoles aren't expected until 2012/2013, right? Or did I miss something?

AlNom · May 25, 2010

Sony might be feeling a little burnt from RSX, but who knows...

homerdog · May 25, 2010

All I wanna know how this affects Project Offset

Lux_ · May 26, 2010

Anandtech: Intel Kills Larrabee GPU, Will Not Bring a Discrete Graphics Product to Market
It seems that between many-core experiments and GPU integration there were not enough room for an actual product.
"Realtime raytracing", "revolutional rendering pipelines" etc are one step further again

Erinyes · May 26, 2010

rpg.314 said:
So,

More GMA crap.

To be honest HD graphics on Arrandale/Clarkdale isnt too bad, its very close to 780G/880G and is only beaten by MCP 79/89.

Sandy bridge should offer almost double the performance and IMHO that should be plenty for most people

rpg.314 said:
No LRB this year for sure.
More experimental stuff.

By short term im sure they mean ~5 years. And according to Anand its been canned completely, it simply isnt on the roadmap anymore

AlStrong said:
Sony might be feeling a little burnt from RSX, but who knows...

What exactly is wrong with RSX? :???:

rpg.314 · May 26, 2010

Erinyes said:
By short term im sure they mean ~5 years. And according to Anand its been canned completely, it simply isnt on the roadmap anymore

Perhaps it has been replaced by the mythical 'converged pipeline' bee.

MfA · May 26, 2010

How well does GMA HD perform mm2 for mm2?

Would be funny if Intel massively scaled up their IGP to compete with Fusion, would be an indictment of Larrabee in a way.

liolio · May 27, 2010

Charlie holds firm to his previous article somehow I think he is right, the statement is not different than the one made in fall 2009, people are over reading it.
Charlie's paper is here

rpg.314 · May 27, 2010

BSN: Why Intel Larrabee Really Stumbled: Developer Analysis

http://www.brightsideofnews.com/new...lly-stumbled-developer-analysis.aspx?pageid=1
This is not a summary by any means. Just a few bits I thought would be thought provoking.

They can take advantage of their existing world-dominating x86 architecture and its incredible performance.

So, you are gaining cache locality for the frame buffer [where you need it less], but losing cache locality for textures and shaders, where it is potentially 1000s of times more critical.

This is called cache coherence, and it requires a lot of on-chip communication. In fact, for large numbers of cores, it requires really quite a huge amount of on-chip communication.

The inter-core communication saturated extremely [transistor-costly] costly 1024-bit bi-directional bus [similar to one ATI used in R600 GPU] and efficiency dropped with each added core. Effectively with Larrabee, Intel created a non-scalable chip architecture.

The biggest shock for Larrabee was the discovery that it takes longer to implement something in software on a multi-core CPU than custom hardware.

rpg.314 · May 27, 2010

I for one, would LOVE a LRB as an HPC part if it came with a decent OCL environment. LRB as a GPU is quite different thing.

However, before I even touch it with a 10 feet pole, I would be curious about the longevity of it's architecture. It has to last ~2 years in a market which is clearly not enough to sustain R&D. And the last thing I want to do is code for a dead end architecture like Cell.

It becomes a question of how different LRB3 will be.

pcchen · May 27, 2010

rpg.314 said:
I for one, would LOVE a LRB as an HPC part if it came with a decent OCL environment. LRB as a GPU is quite different thing.

However, before I even touch it with a 10 feet pole, I would be curious about the longevity of it's architecture. It has to last ~2 years in a market which is clearly not enough to sustain R&D. And the last thing I want to do is code for a dead end architecture like Cell.

It becomes a question of how different LRB3 will be.

Personally I think the "SCC" architecture is probably better fit for HPC market. However, it's probably not that suitable for consumer market (although it'd be interesting to see how a "SCCed Larrabee" performs on GPU workloads).

aaronspink · May 27, 2010

rpg.314 said:
http://www.brightsideofnews.com/new...lly-stumbled-developer-analysis.aspx?pageid=1
This is not a summary by any means. Just a few bits I thought would be thought provoking.

Meh, random comments are random comments.

Andrew Lauritzen · May 27, 2010

Yeah not sure if I agree with all of the analysis in that article. Certainly some of it is true, but other parts of it are a bit questionable/speculative (for instance, the note about cache coherency of tiling vs immediate-mode rendering misses the critical role that tile sizes play in this trade-off) and I have large issues with the discussion about fixed-function vs. programmable hardware and the "sell" of doing GPU stuff *and* moving CPU stuff to the GPU at the end of the article. That last part misses fundamental points about the algorithmic efficiency of solving various problems on different architectures.

Not sure I buy his scaling arguments either.

epicstruggle · May 28, 2010

Erinyes said:
By short term im sure they mean ~5 years. And according to Anand its been canned completely, it simply isnt on the roadmap anymore

I would agree, at least from what Ive heard.

Ailuros · May 31, 2010

Love_In_Rio said:
Well, so i supposse all of the three future consoles will have an ATI card inside...

At least two must have started only recently to evaluate IP. Don't you think you're running a bit ahead of yourself?

Larrabee delayed to 2011 ?

3dilettante

Andrew Lauritzen

Moderator

Jawed

rpg.314

Love_In_Rio

Alexko

AlNom

Moderator

homerdog

donator of the year

Lux_

Erinyes

rpg.314

MfA

liolio

Aquoiboniste

rpg.314

rpg.314

pcchen

Moderator

aaronspink

Andrew Lauritzen

Moderator

epicstruggle

Passenger on Serenity

Ailuros

Epsilon plus three