Predict: The Next Generation Console Tech

Discussion in 'Console Technology' started by Acert93, Jun 12, 2006.

Thread Status:
Not open for further replies.
  1. ShadowRunner

    Veteran

    Joined:
    Apr 6, 2008
    Messages:
    1,440
    Likes Received:
    0
    So what do we need the extra CPU power for? Is it feasable for Sony to use Cell, as is, in PS4 and with the savings made have a much better GPU?

    I would think that the main problem with this is deminishing returns, if PS4 uses a top of the line GPU anyway doubling the amount spent on GPU wont give double the power so in this regard spending that budget on CPU could give better gains. Hope that made sense :lol:
     
    #2941 ShadowRunner, Nov 27, 2009
    Last edited by a moderator: Nov 27, 2009
  2. Squilliam

    Squilliam Beyond3d isn't defined yet
    Veteran

    Joined:
    Jan 11, 2008
    Messages:
    3,495
    Likes Received:
    113
    Location:
    New Zealand
    If the GPU is truely going to consume the CPU then whats the point in a new CPU architecture at this point? In the case for Microsoft especially is there any reason why they would need to do more than add say another core and out of order execution at the 28nm Global Foundry node with that taking up 50-75mm^2 of die space and simply devote the rest of the 200-250mm^2 die space to a new and advanced GPU architecture and simply beef up the ED-Ram to 30MB and chuck in a GDDR5 memory bus with 8 * 1024Mbit/2048Mbit GDDR5 modules on a 128MB bus?

    Is there any reason why more than this minimum is required to extract good/cheap performance for the next generation? 80GB/S Bandwidth should be more than enough with ED-Ram + memory bandwidth and slightly more than Juniper performance should give you enough juice to run most games at 60 FPS @ 1920/1080 with 2xMSAA in an architecture that shouldn't use more than 100-120W all up.
     
  3. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,528
    Likes Received:
    862
    Power 7 has dual issue for VMX. Being OOO and having 6-way issue would bring it pretty close to your four SPEs in raw computing power. Running real programs, it would likely stomp all over them.

    And you shouldn't really pitch one P7 vs. 4SPEs, you should pitch 8 P7s vs 32 SPEs.

    That's not a benefit.... At all!

    It means that all data going into and out of the SPEs is from/to main memory, wasting precious bandwidth (and adding latency).

    Cheers
     
  4. Weaste

    Newcomer

    Joined:
    Nov 13, 2007
    Messages:
    175
    Likes Received:
    0
    Location:
    Castellon de la Plana
    Why is the wikipedia article on Power7 that quotes an IBM presentation say that an 8 core Power7 chip gives 258.6 GFlops? The IBM presentation does not specify what that is exactly, it could be DP, it could be SP, but 260GFlops is not much more than what Cell can theoretically deliver.

    http://www.it.utah.edu/leadership/committees/IT_Managers/papers/IBMinEducation.ppt

    It's on the last page, 32.3GFlops per core for a Power7 core, it's basically the same as an SPE at 4Ghz. 32 SPEs would give a theoretical TFlop.

    What's the difference in power consumption?
     
    #2944 Weaste, Nov 27, 2009
    Last edited by a moderator: Nov 27, 2009
  5. Crossbar

    Veteran

    Joined:
    Feb 8, 2006
    Messages:
    1,821
    Likes Received:
    12
    Integer performance?
    The SPUs are dual issue so well written code should have and ipc close to 2.

    But consoles run games so real programs are not really interesting.

    I am not sure I understand your point, could you elaborate?

    Or you could look at it from the other side, asynchronous memory access let you use the memory bandwidth close to it´s maximum without seeing a drop in performance. Do you really want to store all streamed data in the L3 cache anyway?
     
  6. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,528
    Likes Received:
    862
    It's double precision. Since each core does 4 FMADDs per cycle with just two FP issue ports, it is vectorized FP. Single precision would be double that, at least it would be in a consolized version.

    Anyway, stop looking at megabollocks/second. There isn't all that much correlation between peak FP throughput and actual performance. A Core 2 Duo @ 2.4GHz has under 40 GFLOPS of peak SP throughput, but crushes XCPU in every single game that is cross platform.

    Cheers
     
  7. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,528
    Likes Received:
    862
    There's a world of difference between having a concurrency of 4 and 32.

    But you would still be limited by your main memory bandwidth. At CELL's 256 GFLOPS and 20GB/s bandwidth you need to do, on average, 12 floating point ops per byte, or around 50 ops per 32bit sp value.

    No, I don't. I want a reasonable prefetcher to load the data for me, and I want to use stores with non-temporal hints to store past the caches.

    Cheers
     
    #2947 Gubbi, Nov 27, 2009
    Last edited by a moderator: Nov 28, 2009
  8. Weaste

    Newcomer

    Joined:
    Nov 13, 2007
    Messages:
    175
    Likes Received:
    0
    Location:
    Castellon de la Plana
    Isn't that the difference though? Games don't need DP, so if an 8 core Power7 delivers half a TFlop, it's still hypothetically less than a 32 SPE Cell.

    As for the Core Duo vs the Xenon, I thought that there was a consensus that the PPE or Xenos core was a relatively poor processor? Cell isn't a 3 core PPE. Also, what work does a Core 2 Duo do regarding a PC game in relation to say what Cell does in certain PS3 games? Games have fixed performance on the consoles due to knowing what is going on with the hardware at any given moment. That can not be said about a PC game.

    What is the power consumption of an 8 core Power7?
     
  9. function

    function None functional
    Legend Veteran

    Joined:
    Mar 27, 2003
    Messages:
    5,135
    Likes Received:
    2,248
    Location:
    Wrong thread
    Actually, in some heavily threaded games like GTA4 and Lost Planet (in the CPU intensive Caves section for example) the Core 2 Duo isn't all that much faster. In Saints Row 2 it's actually slower, iirc.

    Not that this changes the essence of your point, but there are cases where for whatever reason the XCPU does okay compared to dual core processors from 2006.
     
  10. Crossbar

    Veteran

    Joined:
    Feb 8, 2006
    Messages:
    1,821
    Likes Received:
    12
    Yes and no, if you arrange your jobs in one queue which is the preferable method of Cell then increasing the number of SPEs do not add any additional complexity at all.

    BTW didn´t the Power7 have 4 hardware threads/core?

    There are of course cases Cell would benefit from an L3(L2) cache and I am not sure how much of the DMA transfers goes through the L2 cache of Cell, but if you want to get elaborate with hints you can also start passing data between SPES within the CPU, their local memory will after all be four times the density of Power7.

    And I can´t help believing that process switches is what really makes that big L3 cache shine. Therefor I think Power7 is more likely to end up in Servers than in a next gen console.
     
  11. Weaste

    Newcomer

    Joined:
    Nov 13, 2007
    Messages:
    175
    Likes Received:
    0
    Location:
    Castellon de la Plana
    I don't think that SPE DMA transfers touch the Cell cache at all.
     
  12. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    7,583
    Likes Received:
    703
    Location:
    Guess...
    I think that has much more to do with the level of effort from the developers than the relative performance of the CPU's.

    Lost Planet has great performance on a C2D anyway. Saints Row on the other hand must have been programmed by monkeys. I've never seen such an insanely bad porting effort in my life. The game would probably run better through an emulator!
     
  13. fehu

    Veteran Regular

    Joined:
    Nov 15, 2006
    Messages:
    1,462
    Likes Received:
    395
    Location:
    Somewhere over the ocean
    and what a custom processor built starting from power7 cores without the fpdp transistors?

    something like a 4 core / 16 thread with 8MB of edram and almosto nothing else can be powerfull enoght, easy to program for, small to produce, and low power enought to fit in a nextgen console?
     
  14. neliz

    neliz GIGABYTE Man
    Veteran

    Joined:
    Mar 30, 2005
    Messages:
    4,904
    Likes Received:
    23
    Location:
    In the know
    Did Repi post his love for Nehalem here yet?

    16 Threads on the CPU seems like a decent average target for 2012.
     
  15. msia2k75

    Regular Newcomer

    Joined:
    Jul 26, 2005
    Messages:
    326
    Likes Received:
    29
    That's for a next-gen console targeting a 2012 release, right?
     
  16. Crossbar

    Veteran

    Joined:
    Feb 8, 2006
    Messages:
    1,821
    Likes Received:
    12
    I don't think the question is if you can make something that is good enough, because you obviously can. The question is more like is it the best alternative considering the target application which is a game console?

    Questions like:
    • Is single thread performance that important?
    • Is integer performance more important than floating point performance?
    • Is the L3 cache that important if you have full control of your data streams and don´t have to bother with frequent process switches?

    I am really not pitching Cell and SPUs as the strongest contender, IBM probably have more alternatives up their sleeves. Anyway I think it is quite telling that neither the 360 or the PS3 ended up with OOO CPUs this generation, but the criterias may have changed in this round.

    I also doubt we will see edram in any console CPU unless it can be supported by multiple foundries, but I do hope it will happen.
     
  17. ADEX

    Newcomer

    Joined:
    Sep 11, 2005
    Messages:
    231
    Likes Received:
    10
    Location:
    Here
    Power7

    All this talk of POWER7 and what is in the PS4 is completely ignoring power consumption and cost.

    POWER7 is a 200W chip that is large and has huge memory and I/O busses.

    Cell on the other hand has all the top places on the Green500 list. It's fast, efficient and these days about 35W in the PS3.

    What they might be able to do is have a bunch of POWER7 cores, remove the L3 and add in a load of SPEs, pretty much what the PowerXCell 32iv was going to be.

    However that may be far too big. I think a POWER7 core or 2 is a distinct possibility, they'll be fast and low power enough. However the SPEs are very efficient and very good at what they're designed for. There's also a lot of experience using them now and tools to use. I think they'll stick with them.

    Changing to Larrabee in place of Cell would be suicidal, it's a completely different arch and it's not as if it's a normal x86, I suspect its going to be just as difficult, if not more so, to program than Cell.
     
  18. Hornet

    Newcomer

    Joined:
    Nov 28, 2009
    Messages:
    120
    Likes Received:
    0
    Location:
    Italy
    From what I've read, Fermi defines a significant step toward general pupose computation on GPU chips. Having both a Cell-like CPU and a Fermi-like GPU doesn't sound likely to me. What algorithms run well on the SPE but aren't going to run well on a GPU like Fermi? Wouldn't having a fair amount of OOO with a large cache a better option? It would provide a better support for algorithms that don't run well on the SPE and make multiplatform titles easier to port to the PS4. On the other hand, stream oriented computations could be run directly on the GPU cores. Another option would be making a Cell-based GPU, adding texture units and whatever dedicated hardware is still worth having in the next decade.
     
  19. neliz

    neliz GIGABYTE Man
    Veteran

    Joined:
    Mar 30, 2005
    Messages:
    4,904
    Likes Received:
    23
    Location:
    In the know
    I thought AMD already had the graphics design wins for both the PS4 and nextbox?
     
  20. fehu

    Veteran Regular

    Joined:
    Nov 15, 2006
    Messages:
    1,462
    Likes Received:
    395
    Location:
    Somewhere over the ocean
    ok look at this :p

    the ps3's cell has only 7 active spe, and some are reserved for the os
    spe cores are very small even at current production process

    what about a reworked power7+ with 4 traditional cores and 4 spe put there for compatibility and to not waste all the matured know how?
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...