Panajev2001a
Veteran
If SPUs were 16 wide.
Which is another of the "SPU v2 I would like" requests nAo was making (increased vector width) IIRC.
If SPUs were 16 wide.
You nailed it.The article didn't have much meat to it, given the amount of words expended.
Meh ... if cache to cache communication is fast enough you can use neighbouring processors to work on large vectors, but it's impossible to split up a SIMD to handle divergent workloads.
Meh ... if cache to cache communication is fast enough you can use neighbouring processors to work on large vectors, but it's impossible to split up a SIMD to handle divergent workloads.
Right but it's not clear to me that you can do much better than CPU-like designs for those highly-irregular workloads anyways. 16-wide SIMD seems like a good sweet spot for most workloads.Raytracing just for instance is not a reasonably data parallel workload (not for the hard stuff, ie. non primary/shadow rays).
You can get quite a few i7 cores now in multi-socket systems and OOE logic and higher frequencies make a big difference. If you're not using the SIMD I would imagine that you're not going to do a lot better than a good CPU.Raytracing is not very highly data parallel, but it's still massively parallel ... I'd still take a Larrabee over an I7 for this kind of problem, the more cores the better.
Accepting the many assumptions in your post, sure, but we were just discussing the suitability of various architectures for MIMD-style code. The majority of the power of GPUs is in their SIMD units (Larrabee included, but obviously less than other GPUs) and they perform significantly less well with heavily divergent code/data structures.The cores per dollar argument is pretty strong in Larrabee's favor.
Not to dispute, but where are you getting these numbers from?The assumption was that a Larrabee card was released at its target clocks at its target price range.
That's 32 ~2GHz cores, with a price ceiling of $500-600.
You can't really consider the street price of these things when comparing architectural efficiency. I imagine the margins for CPUs and GPUs are pretty different.With a Gulftown hexacore, the price of the chip alone can meet or exceed the board (by a lot with an extreme edition).
Again, see above. While this may be relevant to an end user building a system, I was discussing overall efficiency of an architecture for a given workload.Intel's market segmentation charges a significant premium for socket counts and core counts that are higher.
Not sure you can directly compare "thread count" like that between these architectures...The 8 core scenario sounds the most cost effective, but that is a factor of 4 disadvantage in core count and a factor of 8 in thread count.
That much I completely agree with Just give some credit to CPUs where it is due.Larrabee, for all its possible weaknesses, would have been a lot of silicon for a very depressed price thanks to what it was targeting.
Questions
- So whats the projected core count on i7 / ix cores in 2012?
- What will the trannie budget for for the LRB3 be?
- IF they are revising/streamlining the LRB core what do you think the core count will be for LRB3 in 2012? 64 / 96 / 128?
- And more generally if they target late 2012 what process will they make it on? 22nm ? (It will prob need to).
The clock range for Larrabee was initially put out in slides that had it ranging from 1.5 GHz to 2.5 GHz. Granted, those slides were old and did not plan on an 32-core variant.Not to dispute, but where are you getting these numbers from?
Why can't I buy two Larrabees?You can't really consider the street price of these things when comparing architectural efficiency. I imagine the margins for CPUs and GPUs are pretty different.
Efficiency in what terms?Again, see above. While this may be relevant to an end user building a system, I was discussing overall efficiency of an architecture for a given workload.
It's 4 hardware threads per core for Larrabee, 2 for Nehalem. There are significant differences in implementation, but nearly an order of magnitude difference should count for something.Not sure you can directly compare "thread count" like that between these architectures...
I'm not saying that at all... I'm say cost to produce is more relevant for comparing the efficiency of a processor than cost to consumer (which includes profit margins).Why can't I buy two Larrabees?
Saying that an unbounded number of i7 cores in an unbounded number of sockets is not a fair comparison to Larrabee, which apparently is being assumed to be singular.
The different hardware threads are just to cover various latencies, etc. They cannot all execute an instruction in the same clock. See the Larrabee architecture paper:It's 4 hardware threads per core for Larrabee, 2 for Nehalem. There are significant differences in implementation, but nearly an order of magnitude difference should count for something.
To consider the benefit of these "HW thread" implementations then, you need to consider the memory architecture, which is very different between the two. Sure Larrabee theoretically has twice as many hardware threads with which to hide latencies, but GPU memory latencies are typically far more than 2x longer than CPUs.Switching threads covers cases where the compiler is unable to schedule code without stalls. Switching threads also covers part of the latency to load from the L2 cache to the L1 cache, for those cases when data cannot be prefetched into the L1 cache in advance. Cache use is more effective when
multiple threads running on the same core use the same dataset, e.g. rendering triangles to the same tile.