Xbox One (Durango) Technical hardware investigation

Discussion in 'Console Technology' started by Love_In_Rio, Jan 21, 2013.

Thread Status:
Not open for further replies.
  1. Rangers

    Legend

    Joined:
    Aug 4, 2006
    Messages:
    12,791
    Likes Received:
    1,596
    it would more likely mean there were less or no esram troubles to begin with. you have to get pretty convoluted to turn it into a negative somehow.
     
  2. Gipsel

    Veteran

    Joined:
    Jan 4, 2010
    Messages:
    1,620
    Likes Received:
    264
    Location:
    Hamburg, Germany
    On Pitcairn, there are actually 1280 small register files (4kB in size), each with a bandwidth of 16 Bytes per clock (Hornet's number was too low ;)). There are 20 LDS arrays of 64 kB (consisting of 32 banks of 2kB) with a bandwidth of 4 bytes per bank and clock (up to 128 byte per clock per LDS). There are 20 vector L1-D caches, each delivering up to 64 bytes per clock. There are 6 scalar L1-D caches (working also as constant caches), each client (3 or 4 CUs are linked to one sL1-D) can fetch up to 16 bytes per clock (up to 64 Bytes per clock per sL1-D). And finally there are 8 tiles of 64kB L2 cache. Each tile has a bandwidth of up to 64 Bytes per clock.

    As you see, there is not a single isolated data structure in a Pitcairn GPU (in neither GCN GPU) which can be read faster than with the 128 bytes per cycle the 32MB eSRAM of Durango is capable of.
     
  3. warb

    Veteran

    Joined:
    Sep 18, 2006
    Messages:
    1,057
    Likes Received:
    1
    Location:
    UK
    It more likely does the reverse, because they haven't told devs about any down clock.
     
  4. Strange

    Veteran

    Joined:
    May 16, 2007
    Messages:
    1,698
    Likes Received:
    428
    Location:
    Somewhere out there
    Launch countries does not include Japan.
     
  5. BeyondTed

    Newcomer

    Joined:
    May 20, 2013
    Messages:
    233
    Likes Received:
    0
    Interesting data you posted.

    But what I am missing/not understanding is how do you know "128 bytes per cycle the 32MB eSRAM of Durango is capable of".

    Do you know for fact but are not allowed to say how?

    Or do you have a reference you can share that I can go read/look at?

    I am curious if the 32MB eSRAM is a single structure or if it is many smaller pieces distributed in some way.

    I have seen a number of posted which indicate that the writers believe it is one block but I have seen no source for that data point. Can you share that?
     
  6. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    Have you compared your calculations to the VGleaks articles with Durango documentation? While incomplete and not guaranteed to be completely 100% correct in the end, it has been corroborated as being sourced from official documentation by multiple outlets.
    So far, it has been very consistent with official statements.

    There are posters on this board with insider connections, which I have factored into my calculus, but the numbers I've commented on have been drawn from public sources, or can be calculated using known properties of the architectures involved.
     
  7. Gipsel

    Veteran

    Joined:
    Jan 4, 2010
    Messages:
    1,620
    Likes Received:
    264
    Location:
    Hamburg, Germany
    The 128 Byte per clock for the eSRAM was given in the documentation vgleaks cited from (which was confirmed to be legit). And in case you missed that, bkilian (who was in a team working on the XBOne [albeit not the eSRAM] not too long ago) also stated the 128 bytes per clock bandwidth. That number is as safe as it can be at this point in time.
     
  8. upnorthsox

    Veteran

    Joined:
    May 7, 2008
    Messages:
    2,106
    Likes Received:
    380
    Isn't the eSram also a composite of banks and thus the 128 byte per cycle is also a composite?
     
  9. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    The individual register files and caches are far more local to their memory clients. A register's data is going to the adjacent ALUs, and the caches to the adjacent CU memory pipelines.

    The L2s have more distance to travel, and they have multiple clients on the other side of the crossbar.

    The eSRAM is much larger in comparison, although it is drawn as interfacing to the GPU's memory subsystem, which may take care of servicing potential clients since the data there can go across the whole SOC. Subdivision within the eSRAM may make it a reverse situation to the L2s, where there are many more storage pools trying to wire into a smaller set of readers.
     
  10. bkilian

    Veteran

    Joined:
    Apr 22, 2006
    Messages:
    1,539
    Likes Received:
    3
    Yes, what I stated is factually correct, from inside info, although you can see it on vgleaks too. The bandwidth to the ESRAM is 102GB/s at 800 MHz.
     
  11. BeyondTed

    Newcomer

    Joined:
    May 20, 2013
    Messages:
    233
    Likes Received:
    0
    Well that must settle it then. I don't really understand (yet) as I was expecting/wishing something else. I am having a hard time wrapping my head around 5 billion transistors and 30MB of SRAM and the resulting specs.

    So I will wait and watch to see how the two systems perform and what the game play and actual visual quality turns out. I am interesting in how more interesting worlds (in Skyrim and Dragon Age like games) work out if cloud computing is not vaporware but in general my interests are heavily graphics and science fiction related. I hope to see the system generate nicely graphical fidelity.
     
    #3771 BeyondTed, Jun 11, 2013
    Last edited by a moderator: Jun 11, 2013
  12. Hornet

    Newcomer

    Joined:
    Nov 28, 2009
    Messages:
    120
    Likes Received:
    0
    Location:
    Italy
    I'm not a game developer so please correct me if I'm wrong.

    The Xbox 360 guaranteed peak ROP throughput even with alpha-blending, depth-testing and MSAA. The Xbox One, like any modern GPU, does not have enough bandwidth to guarantee peak ROP throughput even without MSAA, when performing alpha-blending and depth-testing. For instance, when writing to a single 32-bit render-target with alpha-blending and depth-testing, the GPU has to read 8 bytes and write 8 bytes per pixel, which, without caching and compression in the ROPs, would bottleneck pretty much every GPU on the planet, except Xenos in the Xbox 360.

    It is my understanding that this does not matter because:
    1) modern ROPs are efficient enough to avoid many round-trips to the memory (for instance, by storing coarse depth information in the internal caches and using those values as a early-reject filter);
    2) modern rendering techniques do not rely heavily on alpha-blending in the main pass (for instance, I recall from the Killzone presentation, that when filling the g-buffer in a deferred renderer, alpha-blending is disabled);
    3) modern rendering techniques make use of multiple render targets, therefore reducing the relative weight of depth writes compared to color writes.
    GPUs used to have plenty of bandwidth, but this has not been true for several years. Hence, both hardware and software have moved toward more efficient usage of caches and local memories.

    I would not be surprised if some Xbox One titles will render to the main memory while using the ESRAM as a programmer managed cache for textures and compute shader output. After all, similar performing PC GPUs do not have much more bandwidth than the DDR3 pool in the Xbox One.

    I also wonder whether the use of ESRAM instead of EDRAM was just a manufacturing choice or if simulations suggested the improved latency would improve performance significantly enough to justify the additional cost.
     
  13. AlphaWolf

    AlphaWolf Specious Misanthrope
    Legend

    Joined:
    May 28, 2003
    Messages:
    9,470
    Likes Received:
    1,686
    Location:
    Treading Water
    And why would it? Japan's home console market is near death, if it was ever alive for MS.
     
  14. NathansFortune

    Regular

    Joined:
    Mar 3, 2009
    Messages:
    559
    Likes Received:
    0
    Xbone delayed in Asia for about a year.

    This in addition to the horrible price in the UK and no subsidised box pretty much confirms the yield problems that were being touted recently.
     
  15. Rangers

    Legend

    Joined:
    Aug 4, 2006
    Messages:
    12,791
    Likes Received:
    1,596
    umm, no it doesn't. the sales in asia would have been low to nonexistent (as you know), so it has no bearing, and they're launching in 21 countries.

    how many is ps4 launching in? arent i reading on gaf ps4 isnt launching in japan either? so they're having yield issues too? or maybe just japan is that irrelevant now?

    the price is for kinect packed in, havent we been told that would raise the price a million times :roll:
     
  16. Rangers

    Legend

    Joined:
    Aug 4, 2006
    Messages:
    12,791
    Likes Received:
    1,596
    I still wonder if this is the case, why not tout it?

    I guess perhaps because in order to tout it, you have to admit 1.2 teraflops in the first place?

    But if MS is willing to use cloud as a power crutch, touting "super efficiency" or the like seems like an idea.


    Maybe it's all too technical. Even saying "ESRAM" would be so far over most people's heads...
     
  17. Nisaaru

    Veteran

    Joined:
    Jan 19, 2013
    Messages:
    1,133
    Likes Received:
    403
    I would wonder more why AMD or NV wouldn't jump to ESRAM/EDRAM and cash in the profits which would go to the GDDR5 producers over DDR3.
     
  18. expletive

    Veteran

    Joined:
    Jun 4, 2005
    Messages:
    3,592
    Likes Received:
    69
    Location:
    Bridgewater, NJ
    For a machine that is supposedly having ESRAM yield problems, who's hardware situation is "behind schedule and a mess", who's tools situation recently went from 'shitty' to 'partly shitty' (thx bkilllian for that one), and has a 900 GFLOP GPU, I thought the XBO showed very well yesterday, particularly in-game with BF4 and with RYSE, at least as good as what was shown elsewhere.
     
  19. Hornet

    Newcomer

    Joined:
    Nov 28, 2009
    Messages:
    120
    Likes Received:
    0
    Location:
    Italy
    Well, unless some driver trick is possible, a desktop GPU with DDR3 and an ESRAM scratchpad would achieve abysmal performance on most existing titles and require developer effort for new titles (as well as Direct3D extensions). Also, a high-end GPU with such a large ESRAM scratchpad would be huge to manufacture (think of GK110 + at least 64 MB of ESRAM) and hard to cool. The L4 cache approach of Intel is probably the most sensible and it seems to perform well enough. Still not suited for high-end GPUs, though, and it doesn't make sense for ATI and NVIDIA to introduce a non-transparent feature only for low-end and mid-range GPU models. I recall the long term plan of AMD/NVIDIA is stacked memory, so they are probably focusing on that.
     
  20. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,661
    Likes Received:
    1,114
    I don't think it is a cache.

    I think the ESRAM is just mapped to a region of the physical address space. Allocating ESRAM is thus a question of setting up page table entries to point to this region.

    MS *could* remap existing allocations to main memory (making it a software controlled cache of sorts with 4KB pages/cache lines), but I doubt it. It could easily create hard-to-reproduce pathological cases and would require usage heuristics to be gathered on ESRAM use.

    Cheers
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...