Predict: The Next Generation Console Tech

Discussion in 'Console Technology' started by Acert93, Jun 12, 2006.

Thread Status:
Not open for further replies.
  1. corduroygt

    Banned

    Joined:
    Nov 26, 2008
    Messages:
    1,390
    Likes Received:
    0
    It is an impressive number, considering a Radeon 6970 has 176GB/s. I also think it's much more impressive to have 2-4 GB of memory at 200 GB/s compared to having a few tens of MB at 256 GB/s.
     
  2. Silenti

    Regular

    Joined:
    May 25, 2005
    Messages:
    679
    Likes Received:
    392
    What kind of bandwidth would be available with on-die RAM at this point, and is it needed above and beyond the 200-250?GB/sec that a shared memory pool would offer?
     
  3. fehu

    Veteran Regular

    Joined:
    Nov 15, 2006
    Messages:
    1,879
    Likes Received:
    872
    Location:
    Somewhere over the ocean
    can you update the speculations in that post?

    and please use less technical words :S
     
  4. ninelven

    Veteran

    Joined:
    Dec 27, 2002
    Messages:
    1,722
    Likes Received:
    141
    As far as the next xbox is concerned, I'd say the writing is rather on the wall... (SoC from Nvidia)
     
  5. AlphaWolf

    AlphaWolf Specious Misanthrope
    Legend

    Joined:
    May 28, 2003
    Messages:
    9,249
    Likes Received:
    1,412
    Location:
    Treading Water
    :lol:
     
  6. ninelven

    Veteran

    Joined:
    Dec 27, 2002
    Messages:
    1,722
    Likes Received:
    141
    Friendly wager of $1.00? ;)
     
  7. Tahir2

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    2,978
    Likes Received:
    86
    Location:
    Earth
    I am willing to act as a escrow.. $1 milliard dollars right? ;)
     
  8. assen

    Veteran

    Joined:
    May 21, 2003
    Messages:
    1,377
    Likes Received:
    19
    Location:
    Skirts of Vitosha
    Nvidia announcing *now a SoC which will utterly destroy current *phones* in 6 months makes them a shoe-in for a system that has to compete favorably with PCs in 2013-2014? that was quite the leap of faith.
     
  9. MarkoIt

    Regular

    Joined:
    Mar 1, 2007
    Messages:
    392
    Likes Received:
    0
    What if they keep the same SIMD width? So four 16-ALU cluster, for 64 units per SIMD.
    Due to the elimination of redundant transistors, SIMDs should be smaller.

    In the PC space you need an architecture that can perform at its "best" at launch or in a meaningful lifespan ( one year?) . In the console space an architecture more future oriented. And the trend it's increasing shader workloads.. and computing. A future oriented architecture in the console space would probably have a much higher ALU:texture ratio, and maybe the elimination of some fixed function (well, due to the failure of Larabee, we will have to stick with TMUs, but we might get rid of ROPs at this point and maybe also fixed tessellation unit).

    For example:
    32 SIMD 64-wide (4*16 ALU cluster)
    64 TMUs
    128bit MC
    32 to 64 mb of L3 cache on die.
    At 28nm, it shouldn't be much over 200-250 mm^2.
    I think XDR2 have higher cost.


    Internal crossbar can have a bandwidth as high as 1 Terabyte/s.

    BTW: Tim Sweeney (Epic) said that for a real leap in game technology, a huge step in bandwidth is needed... in the order of Terabyte/s. Since there aren't news about the Tb Initiative from Rambus, the only way to achieve this is with a large cache.
    And someone from DICE in one of his presentation, said that it's time to move to 16-ways ALUs.
     
  10. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,281
    Likes Received:
    642
    NVIDIA and Microsoft simultaneously dropping out of the PCGA. Microsoft hiding Fermi specific shading language assembly instructions from the DX 11 documentation ... together with mobile cooperation I would say the writing is on the wall.
     
  11. ninelven

    Veteran

    Joined:
    Dec 27, 2002
    Messages:
    1,722
    Likes Received:
    141
    Maxwell says hi...
     
  12. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,527
    Likes Received:
    4,605
    Location:
    Well within 3d
    SIMD-width has more to do with the number of clusters, not ALUs.
    The helpful thing about having a 4 or 5 ALU cluster is that those counts are roughly what is needed to synthesize the range of more complex operations found in the ISA, with the added benefit of allowing the compiler to repurpose the components of those instructions to perform 4 or 5 simpler operations.
    There isn't an instruction that needs 16 simpler operations, and it would be extremely challenging to find enough ILP to fill a 16-wide VLIW instruction bundle.

    The instruction bundles would be roughly 4 times as wide, and with the lack of IPC, mostly unused.
    The register file would be badly taxed by this as well. It is designed to provide peak read bandwidth for 4 FMADD operations. It would be more complex to supply 16.
    The lack of banking means routing data is much harder.

    I'm not sure if the 16 ALU-cluster SIMD would be smaller. It would most likely be 1/4 utilized for almost all workloads.
     
  13. AlphaWolf

    AlphaWolf Specious Misanthrope
    Legend

    Joined:
    May 28, 2003
    Messages:
    9,249
    Likes Received:
    1,412
    Location:
    Treading Water
    Or it might when they build it
     
  14. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,527
    Likes Received:
    4,605
    Location:
    Well within 3d
    The PCGA drop-out is something I saw news about.
    Is there more information on the assembly instructions?
     
  15. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,281
    Likes Received:
    642
  16. Silenti

    Regular

    Joined:
    May 25, 2005
    Messages:
    679
    Likes Received:
    392
    Well if that is the case, does that not seal the deal? If you have to have it and the only way to get it is through on-die memory then ... ? (Not an accusation, just saying the decision may be made for the manufacturers.)
     
  17. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,281
    Likes Received:
    642
    The only way to get such a huge boost in throughput from cache is with tile based rendering, unless the entire frame fits of course (and of course even then with really heavy unique texturing you're still going to be external bandwidth limited).

    Using generic cache as a render target (like Larrabee) is a bit of a waste ... for small tri's (which is where things are heading) you want really fine grained access (32 bit banked ports, similar to GPU local memory only with some FIFOs to even out the load on the banks) and normal caching would add lots of overhead for that.
     
    #5177 MfA, Feb 23, 2011
    Last edited by a moderator: Feb 23, 2011
  18. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    If you are doing one work item per fragment, why not keep the render target data in registers? Pixels aren't going to talk so why use a shared resource?

    If the shading is happening in ocl or equivalent, then it doesn't matter anyway.

    Using generic cache is obviously less wastes some area and/or power over a dedicated shared mem, but the unification of reg/shared/cached mem is a VERY BIG DEAL, imo. The increase in flexibility of mem hierarchy is totally worth it for me.
     
  19. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,281
    Likes Received:
    642
    Pixels do talk ... z-buffering, blending, atomics.

    PS. just because the memory space if unified doesn't mean there can't be specialized caches.
     
    #5179 MfA, Feb 23, 2011
    Last edited by a moderator: Feb 23, 2011
  20. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    That depends on how you setup you tile. If you allocate one workitem per pixel, then there's no need. Other methods may be more efficient though.
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...