Larrabee delayed to 2011 ?

Discussion in 'Architecture and Products' started by rpg.314, Sep 22, 2009.

  1. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,610
    Likes Received:
    825
    I know what the demo was too, SGEMM ... how is that relevant?
     
  2. dkanter

    Regular

    Joined:
    Jan 19, 2008
    Messages:
    360
    Likes Received:
    20
    What I'm saying is that I know what the demo hardware was. It wasn't an 80 core R&D experiment.

    DK
     
  3. keritto

    Newcomer

    Joined:
    Apr 3, 2009
    Messages:
    143
    Likes Received:
    0
    I believe you when you say it ain't smart or easy to mess with engine itself, but it also ain't smart to just pushing transistor numbers up w/o looking what could be done inside a core. And ATi know's it with recent native tessellation engine pushing into core. We all saw how building around on old chip design ends up in few times renamed G300 performer now known as:lol: Fermi.

    ATi pretty much already exhaust original idea behind R600 with now fully upgraded rudimentary tessellator (that never lived on R600) finally being adopted in native TES engine in dx11. So for some further advancement beyond FMA they should widen their SIMD core to 256-bit (288) and 4 S-unit + 4 T-units could cope 4 32b FP+4 32b INT similar to nV. Somewhat reduction to only 4 SPs per SIMD (w/ only 2 T-unit -- "SFU") doesn't seem appealing. Btw. all developers need some upgrades from time to time, i know R600 is still rather young and still not paid out but i believe that just wide 160 bit bus to 288bit wouldn't hurt them much. Except for well known hard to develop proper compiler problems when it came out to AMD. But at least they had some time to development until Larabee is out.

    Maybe simpler 5D idea is better but then again it's also needs wider 200bit bus instead todays 168bit and maybe is an utilization overkill to have 5 T-units, or even my 4T and original idea for R800 with 2 T-units SP is just a sweet spot. I just go along with nV-idea of having same number of INT+FP capable units :oops:


    So you think HPC market battle is not enough :mrgreen: I just hope they won't kill nVidia on that ground an later aquire it. I don't recall if they until today claim that LB would support VT-X? And still don't see any new Itaniums looking few year back on horizon. So, as i already stated somewhere LB could be next great engine itself. They used to implement Pentium/D good stuff on Itanium 2 but since Core 2 every part of this work is vanished from public eye.


    Yep but i'm in disbelief that delayed Fusion at 32nm will have only basic dx10 support in 2011 :mrgreen: I'd bet more for dx11 based on today's R800 (not R600 as amd originally claimed when they promised Fusion chips for beginning of 2009 no matter they basically same) with some 800SP as RV840/JuniperXT has just on 32nm and based on 32nm HighK it could consume only 40W at same speeds as today JXT (850MHz) and we might look at some dx11.1 (11+) compliance when R900(+FMA) arrives probably next year not too much later after Fermi introduction. If AMD is really AssetSmart. They have knowledge just if r&d on Cypress pays out quick enough, cause they need nowhere to rush considering roughly same performance for their HD5950 vs. Fermi counterparts.

    So i'd say AMD if they're smart enough have just enough sys requirements with 4-core Bulldozer core and some 800SP by thier side on 32nm. The other thing is if programmers will avoid them as usually did in time they first introduced 3dNow (2yrs before iSSE in P3) and x86-64 (in reality 3yrs ... before real 64bit part came as Conroe aka.Core2)
     
  4. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    "This Puppy I have here, this is Larrabee. Oh - wait. No. It's Polaris reloaded, which also had 80 cores." *SCNR*
     
  5. keritto

    Newcomer

    Joined:
    Apr 3, 2009
    Messages:
    143
    Likes Received:
    0
    Aren't voxelized object worse when there's need to render their interactions while with simple pixelized object we have simple and well developed techniques needed to easily add real life look on their interaction? Aren't just voxels just a quick patch for rendering complex 3D object on hardware available doing it in 2D space?

    And Lara Bee is running cold as polar ice and not wasting any energy? :Mrgreen: Seems not only hype about proposed LB performance are flying around but also some hype about how greenish performance per watt LB has. Is that a reason why estimated LB TDP was 300W just not clearly stated is it for this 45nm revisions, or for older 65nm ones if there were any of them? And excellent tdp is the reason Intel waiting for straight to 32nm Lara Bee release while cleaning out some performance bugs?


    And the real reason for it is ....... G80 quasi DX10 support, or they support it as max as they could do on pretty advanceed and still relaively new G70 architecture (which was 9.0c btw). So the lack of features and all of that mumble of dx10.1 numbers (lacking from nV) meant that not everything is int the numbers itself, as Charlie try to remember but in nVidia's ability to release their G80 chip 6 month before competition and blackmail MegaCorps to kick out what is suppose to be real DX10 support from their future Vista release and put something that only goes by name dx10 in so much expected and bad performing future MegaCorps OS.

    So in fact we're been deprived for all that DX10 has to offer onR600 architecture in favor of nV dx10 deal which was nothing more than eyecandicized dx9.0c as you state. In fact we should look something which should much like resemble todays dx11 only 2.5yrs before. And dx11 well it would brought never tessellation engine after all.


    You coldn't just pass to point out how nothing new is really there in dx11 :grin:

    But you forgot to emboss Even on DX10 hardware you can program tessellation through Compute Shaders on ATi based cards, and dx10 as i mention in above reply was exact something that should introduce new hardware tessellation ability itself. While dx11 and R800 series itself brought some texture compression improvements and no drop in performance for old compression methods. And tessellation itself heavily reduces need for memory bandwidth to produce same rasterization setup. So i'd say that these are pretty big things ATi introduced in it's dx11 engine. I see you're still looking for a good reason to explain to your friends, to upgrade onto DX11 :D


    It's not nice to see how people wear polarized glasses for different weather. Why DX11 was so lightweight to implement for ATi was caused by preexist of rudimentary tessellation on ATI VLIW engine since first ATi DX10 chip (R600), so all basic tessellation setup was there and all they need extra r&d for improving it and implementing it in Compute Shading algorithms and get support by Microsoft dx11. It's ATi architectural design, maybe even to much in-situ, and that couldn't be provided barely from dx11 API by MegaCorps itself or any third party.

    I just hope that same VLIW engine could carry additional capability to cope with next thing on horizon FMA. Hope that ATi wont make lack of implementing FMA into marketing war as Huang is dong for last year and a half with constant r&d troubles for GT200 & G300 now. And not to forget that all that indirection capability of G300 (Fermi renamed after Larabee->Cypress) doesnt look extremely appealing just if it ain't so silicon and research inexpensive as further tessellation or compression advancements are.
     
  6. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,610
    Likes Received:
    825
    Why do you keep harping on FMA? Doing single cycle throughput FMA doesn't matter in FLOPS because it doesn't go any faster than single cycle throughput MAD which is what GPUs have been using since forever ... also ATI has said R800 supports FMA any way.
     
  7. Bob

    Bob
    Regular

    Joined:
    Apr 22, 2004
    Messages:
    424
    Likes Received:
    47
    I've tried to look for a single true sentence in keritto's post, but I just can't find one. Can someone help me out?
     
  8. PeterT

    Regular

    Joined:
    May 14, 2002
    Messages:
    702
    Likes Received:
    14
    Location:
    Austria
    You're ahead of me, I can't even find an intelligible sentence.
     
  9. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,661
    Likes Received:
    1,114
    Larrabee at isscc 2010 ?

    Apologies if this has been posted before.

    From the ISSCC 2010 program:

    5.7 A 48-Core IA-32 Message-Passing Processor with DVFS in 45nm CMOS

    567mm^2 in 45nm, 125W TDP.

    It says 48 cores in the headline, but in the text it is said to be organized as a 6x4 mesh. What gives?

    Cheers
     
  10. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    No it is unrelated to larrabee.
     
  11. Groo The Wanderer

    Regular

    Joined:
    Jan 23, 2007
    Messages:
    334
    Likes Received:
    2
    It is almost assuredly Polaris II, a research project into multi-cores.

    -Charlie
     
  12. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,714
    Likes Received:
    2,135
    Location:
    London
  13. AnarchX

    Veteran

    Joined:
    Apr 19, 2007
    Messages:
    1,559
    Likes Received:
    34
  14. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,610
    Likes Received:
    825
    I like it, pity it's too late for Larrabee.
     
  15. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    I never liked full hw cache coherency especially when you have O(50) cores on a die. For >100 cores, forget it. Shared mutable state is the bane of parallel sw, then why keep support for it in hw?
     
  16. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,400
    Likes Received:
    440
    Location:
    San Francisco
    2 IA cores share the same mesh node/routing logic, see Intel press release material.
     
  17. Simon F

    Simon F Tea maker
    Moderator Veteran

    Joined:
    Feb 8, 2002
    Messages:
    4,563
    Likes Received:
    171
    Location:
    In the Island of Sodor, where the steam trains lie
    So, the Transputer/Occam model of 20 years ago was probably the correct one.
     
  18. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,661
    Likes Received:
    1,114
    Because in certain cases it is fantastically useful, think big shared data structures where reads/queries vastly outnumber updates.

    The problem is of course when cache coherency is used as the sole mode of data transport between cores, - many core systems will choke with coherency probes and invalidate broadcasts.

    We need both, IMHO.

    Cheers
     
  19. liolio

    liolio Aquoiboniste
    Legend

    Joined:
    Jun 28, 2005
    Messages:
    5,724
    Likes Received:
    195
    Location:
    Stateless
    Does somebody notice that larrabee is no longer branded as a GPU but as a:
    "computational co-processor for the INtel Xeon and Core families"
     
  20. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,714
    Likes Received:
    2,135
    Location:
    London
    Sigh, if only...

    I wonder how many transputer cores you could fit in 3 billion transistors :razz:

    Jawed
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...