Intel's Single Chip Cloud Computer

Discussion in 'Architecture and Products' started by Jawed, May 24, 2010.

  1. dkanter

    Regular

    Joined:
    Jan 19, 2008
    Messages:
    360
    Likes Received:
    20
    How can LRB have a 'stock' clock, when there are no SKUs?

    You're missing my point. I think they probably would have traded off density for better power consumption.

    I also don't think it necessarily means breaking TDP. Depends on your cooling. If you cool well, you lower Tj and your transistors run cooler, consume less power, etc. etc.

    Penryn hit markets in late 2007. If LRB hit markets in late '08 (which would have taken a miracle) it probably would have been much better. They would have had a density advantage, a performance advantage, a big power advantage (HKMG + normal process shrink gains) and perhaps, just perhaps, those two advantages could make up for other issues.

    However, that's clearly impossible scheduling wise. A more interesting questions is: What if they skipped 45nm and went straight to 32nm at the end of 2010? Again, while their competition is still on 40nm. They'd have a half-node density advantage, mature yields, a substantial transistor performance advantage, etc.

    The point is that Intel should have figured out how to leverage their process technology advantages more heavily (to make up for software disadvantages), rather than falling victim to an overly aggressive schedule and coming to market at parity.

    David
     
  2. DavidC

    Regular

    Joined:
    Sep 26, 2006
    Messages:
    347
    Likes Received:
    24
    Windsor is 90nm though.

    Yea you are right, the values are wrong.

    Brisbane 154 million, 126mm2

    Still, not that impressive because Isaiah performs far less than Windsor.
     
  3. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    Yet it appears to be quite important as the basis of quite a few "forward looking" multi-core projects.

    Maybe it was a "too simple to fail" ethos?

    L1 bandwidth is vast, so I suppose it's then a question of L2 bandwidth. EDIT: just noticed that BLAS3 is shown with >50x scaling at 64 cores in figure 20 of Seiler - although that's based on 1GHz cores and we don't know how core clock and ring bus clocks scale, nor what off-chip bandwidth is like. Though that graph is based solely on a simulation.

    But yes, it seems inexcusable to have struggled to reach 1 TFLOP single precision like that.

    Jawed
     
  4. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    How much of its problems are to do with software (drivers) though?

    I get the distinct impression that of the hardware and software bites that Intel took, the software one was a tougher chew. Larrabee is essentially dependent upon per-state optimisation for traditional D3D/OGL pipeline rendering. Separating tasks by category and partitioning categories to cores gets them so far...

    Jawed
     
  5. EduardoS

    Newcomer

    Joined:
    Nov 8, 2008
    Messages:
    131
    Likes Received:
    0
    Yeah, Isaiah sucks, but I just pointed the density, in special the density of it's cache, VIA needs much more billions to make a decent processor but for the example of density it's ok.
     
  6. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    So what you're basically saying is that the chump change it would have cost Intel to do the job properly just wasn't worth doing?

    I'm not trying to use transistor density as an absolute metric, just a first approximation. We don't have much of a good idea how big the chip is...

    So what you're saying is that Intel can only do CPUs. Everything else is scrap that occupies production lines that would otherwise be rotting.

    Everyone's power-limited. That's no excuse, it's the norm.

    I don't see how Larrabee, if it launched summer 2009, could have been competitive in terms of die size/power - it was rumoured to be GTX285 performance, best-case. Also I wouldn't take it as read that it was 45nm. Back then there were real doubts it could be, on the basis that Intel was 45nm capacity constrained.

    But I've always had the view that the performance didn't matter until version 3 (as seen by consumers). Intel didn't have to compete for the halo with the first couple of iterations, in my view.

    Jawed
     
  7. dkanter

    Regular

    Joined:
    Jan 19, 2008
    Messages:
    360
    Likes Received:
    20
    Um, no offense, but you do realize that skilled circuit designers who can work with a high performance 45nm process don't grown on trees...even in Oregon?

    Designing custom circuits takes time, people and money. Money cannot buy you people and it sure as hell can't buy you time. Intel's advanced circuit teams had probably gone off to work on other things by then....like designing SRAM or PLLs for 22nm.

    Now what would you do? Take away resources from an essential and undoubtedly profitable project and put them on a questionable one?

    I'm saying that other products have to use the process that is optimized for CPUs, yes. Although with 32nm, there is the SOC process which is much better suited to SOCs. So ironically, a 32nm LRB might have turned out much better.

    Summer is too early, I was thinking 4Q.

    Intel has to sell LRB at at least a mild profit.

    David
     
  8. DavidC

    Regular

    Joined:
    Sep 26, 2006
    Messages:
    347
    Likes Received:
    24
    How did we go from SCC to Larrabee? Of course, I won't deny it has a connection. The "development platform" that SCC is probably could have been Larrabee instead. At least some of the lessons they learn from SCC will certainly migrate to Larrabee as well.

    Yes, it probably had hardware problems as well. Intel can't always avoid that.

    The funny thing is on the SCC they promote cache coherency over software schemes while on Larrabee it was all hardware. I wonder if that will change on future Larrabee?

    Just trying to be on topic. :)
     
  9. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    Depends what proportion of the circuit/node optimisation effort is computational, I'd say. I wouldn't expect Intel to be researching new stuff specifcally for Larrabee/SCC, but some kind of libraries for re-use seem unavoidable.

    Optimisation and verification should be mostly a computing problem, not a manpower problem, shouldn't it?

    Is there any evidence that 32nm works that well?

    So, even less competitive.

    I've never honestly expected anyone at Intel to presume the first chip would be competitive as a halo product. A simple graph over time of the performance of ATI and NVidia chips shows the folly. They were not due to slam into the wall for quite a while after the Larrabee project started - though NVidia's considerably closer with its architecture.

    Eventually, yes.

    Anyway, I'm now convinced that discrete GPUs have about 5-7 years left. The lion's share of the revenue growth is in integrated and mobile, and the tide of performance there is rising very rapidly - even if there's a 20x range between HD5870 and the crap on an 890G motherboard. Llano and Sandy Bridge are both going to put a huge dent in that range. So, arguably, Larrabee as a discrete board was doomed anyway.

    I certainly don't think Larrabee's principles are doomed - just they'll show up somewhere else. This does reinforce the idea, though, that it was always going to be treated like a runt within Intel.

    Atom seems to have been somewhat "accidental". While it was aimed at being low power, it was hardly optimal in its first incarnation, was it?

    Jawed
     
  10. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    There's also ring versus mesh and explicit message routing coupled with dedicated message buffers versus cache-coherency-based messaging in Larrabee.

    It seems that Intel's prowess is solely in terms of the cutting edge high-performance consumer CPUs. It's almost as if Intel has painted its fabs into a corner, where it can't do anything else, well, other than these consumer CPUs. (Well, that's where the revenue is, isn't it?...)

    Well Larrabee's scheme is based on hardware steered by cache intrinsics, locked lines/temporal hints at different cache levels etc. It's not pure hardware.

    The message passing buffers in SCC could easily live inside locked lines in L2... The MPBs are still software managed (e.g. allocation per node within each local MPB). Though the MPBs are tile-shared, not core-local, so Intel is amortising routing costs over two cores. Why not 4 cores? etc.

    I started the thread specifically to compare and contrast these chips, not to be solely about SCC.

    Jawed
     
  11. DavidC

    Regular

    Joined:
    Sep 26, 2006
    Messages:
    347
    Likes Received:
    24
    Yes, it definitely wasn't optimal, Moorestown will improve it a lot. Though it was a first in line of LPIA products. If we look at Larrabee that way, the follow ups could have been impressive.

    Gotcha. :)

    Yes, but they are related in that they are both many core products. SCC is made as a learning platform for future many core products, and Larrabee is targeted towards high flops, and extremely parallel apps. If the future Larrabee derivatives change a lot, it might just adopt SCC's cache coherency scheme. Who knows? Maybe they decide it'll turn up better.

    Just my 2 cents.
     
  12. Blazkowicz

    Legend Veteran

    Joined:
    Dec 24, 2004
    Messages:
    5,607
    Likes Received:
    256
    The thread title has just shocked me, it's terribly funny nonsense.
    I might as well build a Single Computer Computer Network :D
     
  13. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    If something is overclocked, as the 1 TFLOP demo was stated as requiring, I assumed there was some base clock to go over.

    I suspect that would have been secondary to the convenience (necessity?) of trying to get a pipeline first designed to run at ~200 MHz to the 1.5-2 GHz range.

    The range of cooling solutions for the target market is what it is. Perhaps as a boutique product, Intel could have afforded to splurge for Larrabee more so than either Nvidia or AMD do, with Fermi being perhaps the highest class stock cooler and most generously expanded case requirements in quite some time.


    RV770 was released in mid '08 and it was denser than the Larrabee that was put on display. It was barely edged out by the overclocked Larrabee in SGEMM.
    I would have been curious how the 600mm2 chip would have performed, given its 2.5x die size advanage.

    Is this more of political and product-based question?
    The product lines with first dibs on the leading edge processes did not seem particularly interested in Larrabee coming to the party. The latest anouncement of an HPC initiative including Larrabee is actually a change of pace, since Intel or one portion of it had ruled that out earlier. The proposed socket version of Larrabee was ruled out by one of the execs linked with the Xeon lines.

    It would make sense, the idea of a 600mm2 chip sold for less than a fraction of a high-end Xeon with 5-10 times the performance for certain workloads would give Intel's high-margin lines a headache.

    My interpretation of the spokesman's statements was that this was not the case.

    Mostly the ones where the core itself is secondary to the primary focus of the design, or perhaps the project was not really high enough priority to warrant the resources to make a new one.
    For this latest project, it is the messaging network.

    For Larrabee, it was the VPU and cache, and I always got the subjective impression it did not rate that highly (Intel's schizophrenic raytracing/rasterizer statements, high end-no volume-no I mean high end target market, the wandering target performance level, the use of a core at clocks way above its original design envelope).

    As far as the first product is concerned, maybe it was more of a "do enough to just get it working but not enough to make anything else we make money from look bad" ethos.
     
  14. cho

    cho
    Regular

    Joined:
    Feb 9, 2002
    Messages:
    416
    Likes Received:
    2
  15. Npl

    Npl
    Veteran

    Joined:
    Dec 19, 2004
    Messages:
    1,905
    Likes Received:
    6
    Hmm, x86 cores without FPU? Means no backwards compatibility with existing code.
    Wouldnt it be way more interesting to look into existing solution than Intels research projects (or atleast look into them first). I dont see whats new there except the blessing and curse that the x86 legacy holds.
    eg. there isTILEPro64 with 64 (32bit) cores on 90nm.
    Tile-Gx100 with 100 (64bit)cores on 45nm scheduled to arrive 2011.
    Would love to see a technological breakdown on those before looking at research projects that may or may not materialize someday.
     
  16. Blazkowicz

    Legend Veteran

    Joined:
    Dec 24, 2004
    Messages:
    5,607
    Likes Received:
    256
    you can run fpu code on processor with no fpu, such as a 486SX or prior. Exception handling catches fp instruction and integer-based emulation kicks in (I don't know how), successfully running the instruction. (that takes like 2000 cycles).
    old stuff.
     
  17. brain_stew

    Regular

    Joined:
    Jun 4, 2006
    Messages:
    556
    Likes Received:
    0
    Anandtech have a new article up and it seems as though both this and the Larrabee project are merging with the aim to hit the the HPC space with a 22nm chip codenamed "Knight's Corner."

    http://www.anandtech.com/show/3749/intel-mic-22nm-50-cores-larrabee-for-hpc-announced
     
  18. larrabee

    Newcomer

    Joined:
    Dec 21, 2009
    Messages:
    29
    Likes Received:
    0
    optimization of design layouts require a strong intuition from skilled engineers. a lot of time can be wasted trying every layout and a lot of performance can be gained or vice versa. its hard to guess or predict the time for testing and designing a chip and larrabee is very different than anything intel has ever built so it's definitely a lot of work.
     
  19. spacemonkey

    Newcomer

    Joined:
    Jul 16, 2008
    Messages:
    163
    Likes Received:
    0
  20. bbot

    Regular

    Joined:
    Apr 20, 2002
    Messages:
    705
    Likes Received:
    4
    Intel Knights Corner? Why bother making a 50-core chip that has a performance of 1 teraflops, single-precision? Wasn't the cancelled 32-core IBM cell chip going to have the same performance? And doesn't the AMD Cypress(?) GPU already have a performance of 2.72 teraflops, single-precision?
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...