Larrabee vs Cell vs GPU's? *read the first post*

Discussion in 'GPGPU Technology & Programming' started by rpg.314, Apr 17, 2009.

  1. steampoweredgod

    Banned

    Joined:
    Nov 30, 2011
    Messages:
    164
    Likes Received:
    0
    If there are exclusivity agreements with sony with regards to console space use, I doubt sony would be interested in changing those terms.

    There's a difference with this happening when sony was expected to be the console leader by a vast margin, and the cell appeared in state of the art supercomputers.

    The newest record breaker supercomputers aren't cell based, iirc.

    Now with the market more evenly split it wouldn't be the same to have the 1 supercomputer be cell based especially at or near new console launches, if there's any exclusivity agreement.

    No one's saying it has to be on the same chip. Put 3 32 spe cells on a board, given that prior generation cell boards are in the greenest and most efficient supercomputer systems these boards would likely compare favorably.

    Cell local store was said to be more scalable than larrabee's coherent caches
     
    #101 steampoweredgod, Dec 6, 2011
    Last edited by a moderator: Dec 6, 2011
  2. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    Are you seriously saying that even if there was demand for cell or it's derivatives, Sony would insist on exclusivity instead of getting some royalties on top of sunk expenses?
     
  3. steampoweredgod

    Banned

    Joined:
    Nov 30, 2011
    Messages:
    164
    Likes Received:
    0
    Sony invested 100s of millions, iirc, I doubt any console competitor is going to pay that in addition to royalties.

    Powervr gpu royalties are said to go from cents to 1 dollar per chip, and sony pays nvidia 5 dollars per chip, iirc. At most a competitor can be expected to sell around 100M units in its 10 year lifetime, so you can expect 500M income over ten years from royalties with high nvidia like royalties. If cpu royalties are much lower, the expected income could fall under 100M over ten years.(This sum one would presume would subsequently likely be divided between ibm, toshiba and sony)

    I don't see sony handing over a heavy investment like cell for pocket change, especially if it turns out to offer any serious advantage in the console space.
     
  4. Ninjaprime

    Regular

    Joined:
    Jun 8, 2008
    Messages:
    337
    Likes Received:
    1
    I don't know where you heard that, but I'm dubious of its truth. Current modern process 45nm Cell chips have a TDP in the 45-50 watt range, with the SPEs taking up ~40% of the die, IIRC. That would mean ~40% of the die that is doing most of the work is only generating ~20% of the heat.

    100 SPEs by themselves are paperweights without a 12x scaled up EiB and 12 more PPEs to deliever the bandwidth and scheduling they need. Even assuming the 1 watt each were true, 12 PPE cores and more importantly a 12x scaled up EiB would be quite the power hog. This is also saying nothing of the die you are putting it on... 22nm KC is an actual product that has an actual die, I don't know if you could even fit 100 SPEs within the limits of a normal die. 45nm Cell is 115mm^2, so that puts the 8 SPEs at 46mm^2, 5.75 mm^2 each. So your 100 SPE chip is 575mm^2 with just the SPEs, near the limits of a die, with no memory interface, no EiB, no PPEs. Considering the EiB scales horribly this is a major problem.

    I'm not completely clear what you're saying here? That a GPU flop is 1/5th of a normal CPU flop maybe? Thats certainly not true. Look at DGEMM benches, while utilization might be lower on a GPU, it certainly isn't 1/5th. Fermi is about 60% or so in DGEMM from its theoretical peak. KC is even better. KC is pretty well in line with a CPU, I think.

    Doesn't really matter. Latency is relative. 10 nanoseconds to 10 microseconds might seem a lot on paper, but to a game with 16,666 microseconds between frames its nothing. Want a real world test of this? Look at Nvidia and PhysX. If it mattered, and GPU physics was hard, it wouldn't exist.

    Thats a lot of qualifiers and they're all wrong. A hypothetical 45nm 32 SPU Cell, as was once purposed before Cell died off, would maybe be the equal to a rv770 GPU from 2008 in flops crunching for physics.
     
  5. Ninjaprime

    Regular

    Joined:
    Jun 8, 2008
    Messages:
    337
    Likes Received:
    1
    Larrabee may be dead but Knights Corner has risen from the ashes, and its already planned to be in a 10 Petaflop supercomputer, and has actual products with actual specs.
     
  6. steampoweredgod

    Banned

    Joined:
    Nov 30, 2011
    Messages:
    164
    Likes Received:
    0
    Second revision of cell years ago did around 100Gflops
    Note I said I wasn't speaking necessarily of single chip solutions. 3 32SPE cell chips should be viable and should not have extraordinary power requirements, as others commented the memory architecture scales far better in terms of power consumption and heat generation vs coherent cache.

    IBM has said a crossbar can be switched for EiB component without problem, iirc.
    In another thread it was mentioned that programmable flops were 1/5th, the rest being fixed function, though that thread is a bit old. I think it was the "is there anything the cell can do better than a modern cpu gpu" thread.

    If it can adversely affect performance, as long as remaining performance is still reasonable to offer physics it can still be offered it wouldn't make it impossible necessarily so. It would only mean that a more suitable processor would offer that much more.

    A 10x slowdown while still viable can still mean that something 10x faster can do 10x more, if you've several orders of magnitude difference in speed we could hypothetically see orders of magnitude difference in results. This can put a nextgen cell anywhere from being comparable to a 560 to substantially exceeding the 2012 gpus in physics, depending on how such would affect physics calculations.

    this hints that even traditional cpus might not be that bad at physics as compared to gpus.
     
    #106 steampoweredgod, Dec 7, 2011
    Last edited by a moderator: Dec 7, 2011
  7. upnorthsox

    Veteran

    Joined:
    May 7, 2008
    Messages:
    1,909
    Likes Received:
    232
    not quite

    http://www.realworldtech.com/page.cfm?ArticleID=RWT022508002434
     
  8. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    But we don't know much about it.
     
  9. Ninjaprime

    Regular

    Joined:
    Jun 8, 2008
    Messages:
    337
    Likes Received:
    1
    I meant more about the 1 watt each thing, though upnorthsox's link has resolved that. I searched for around 30 min trying to find a TDP for 45nm Cell and the only one I could find was that it was "just below 50 watts." It was on RWT right there the whole time. /shrug

    Maybe, I'm not so sure. If a 8 SPE Cell chip at 45nm is 115mm^2, it stands to reason that a Cell chip with 4 times the SPEs would need 4x the PPE control cores, 4x the EiB, ect. While I don't think it would have to be 4x as big, I think its fairly reasonable to expect it to be at least 3x as big, which is around ~350mm^2, 50% bigger than the original Cell at launch. Putting 3 of these on a board seems unlikely. You're talking over 1000mm^2 worth of dies on a board. Maybe a supercomputing board, but I thought it was about game physics?

    Yeah, but the original reason for using EiB over a crossbar was that its smaller than a crossbar and scales better, so that goes the opposite way you want.

    Sounds ancient, like before unified shaders ancient.

    Still doesn't matter in the context of game physics. 10x latency doesn't mean you lose 10x performance, it means you use 10x the latency hiding tricks and tweaks. If performance was lost from latency, no consoles would use unified memory because thats a latency beast. Most devs prefer it though for its benefits.

    This is completely true, but you have to take it in context. It means that CPUs aren't 20x as bad at PhysX as a GPU, they are only 3-4x as bad. That article is also fairly old, and PhysX has since been multithreaded and updated, IIRC.
     
  10. Ninjaprime

    Regular

    Joined:
    Jun 8, 2008
    Messages:
    337
    Likes Received:
    1
    Not everything, yet, but we know miles more about it than we did about larrabee. Its already far more a real product that any stage of larrabee was. You can actually get 32 core demo boards from intel right now, I don't think that was ever really the case with larrabee.
     
  11. steampoweredgod

    Banned

    Joined:
    Nov 30, 2011
    Messages:
    164
    Likes Received:
    0
    The spes do not need heavy involvement from the ppe, from what I hear the involvement can be quite little, and they can be pretty much autonomous, a stronger ppe can very likely deal with more spes. The EiB, isn't that the simple ring bus, while performance might suffer that shouldn't take much space.

    The old cell servers can reach 6Tflops, and are in the greenest most power efficient architectures, we could compare the old boards using 45nm 9 core cells or upcoming 28nm tech if you like.

    Multi-chip solutions can easily go into consumer boards if we're talking pc consumers.
    Multiple such can be put in a pc.


    Edit: you are correct, but still simple interconnect are said to be viable up to 64 core chips

    The old ageia ppu, with about 125M transistors easily had what seems like 17 cores with what is said to be a similar design.
    I think xenos was mentioned in that thread, that is unified, though I'd have to check.

    IF cost savings are good enough that can do away with performance losses. The cell is fed by XDR for performance reasons, not sure how that fares latency wise. Considering the importance of ever larger caches in modern cpus, it seems clear latency matters for at least some tasks, even gpus have local memory to help them out.
    That article is about a year old, if what it says is correct. We'd have to check nothing horrible has been going on in more recent physx code. And you have to remember PPUs which resemble cell*(and cell itself) are said to be more apt for physics than gpus. PPUs at their time were claimed to be up to 200 times faster than cpus at some physics tasks.

    According to some comments here in beyond3d the cell outclassed 4 core i7 in a developer's physics code, to the point that it was visually significantly noticeable causing them to use the ps3 version of the software.

    If cell and ppu like hardware is truly more optimal design for physics calculations, I have a hard time seeing more optimal silicon suffering at tasks as compared to less optimal use of said silicon.
     
    #111 steampoweredgod, Dec 7, 2011
    Last edited by a moderator: Dec 8, 2011
  12. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    No. There was a lot of technical disclosure regarding Larrabee and there has been none so far for KC. The 32 core boards are rebadged larrabee chips.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...