Xbox One (Durango) Technical hardware investigation

Discussion in 'Console Technology' started by Love_In_Rio, Jan 21, 2013.

Thread Status:
Not open for further replies.
  1. Gipsel

    Veteran

    Joined:
    Jan 4, 2010
    Messages:
    1,620
    Likes Received:
    264
    Location:
    Hamburg, Germany
    Yes, because the frontbuffer has to sit eventually in the DRAM. ;)
    If you do image or buffer writes, it goes through the cache hierarchy, only the ROP exports don't.
     
  2. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    I'm not aware of benches for GCN. The numbers for code running on VLIW GPUs were very high.

    The Vgleaks article on the PS4's "hUMA" implementation indicates that it takes 300-350 cycles to invalidate volatile lines, with 75 related to book keeping and the rest possibly devoted to setup and most importantly the completion of all in-flight accesses. That's before any writeback penatly.
    The CU main cache path for L2 misses still looks to be long.

    Perhaps the ROP caches can bypass that bit? There could be a different latency number for the different memory paths.
    That wouldn't help for cases where the CUs need to read from the eSRAM, but perhaps blending and the like could see a more clear latency difference.
    It might help if we knew more about the organization and implementation of the eSRAM. Some of the admittedly unverified writing about hints that there may have been a desire to keep the overall basis of its operation similar to the banking scheme employed by external memory. Hopefully the various programmable timings within that aren't as prohibitive.
     
  3. Gipsel

    Veteran

    Joined:
    Jan 4, 2010
    Messages:
    1,620
    Likes Received:
    264
    Location:
    Hamburg, Germany
    Isn't that 300-350 cycles more a measure of the depth of the queues (scaling the number of accesses in flight) and not so much an (unloaded) access latency by itself? The whole thing usually has do deal with some heavy contention. That's probably part of the reason why it isn't that simple to put just one latency number on it.
    If one does just blending (which is a fire and forget operation from the view of the shader) the performance is mostly determined by bandwidth as evident from fillrate tests. Loading and storing framebuffer tiles instead of individual pixel to and from the ROP caches vastly reduces the number of read-write turnarounds and makes life for the DRAM controller a lot easier (long bursts). Things could change of course, if you need to read back the just written render target (worst case: while you are still writing to it).
     
  4. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    The leak doesn't state the why, although it calls that count a fixed cost. It may be a pessimistic implementation of the invalidate logic, if it cannot tell whether it is in an unloaded case.
    The wavefront granularity of vector ops and aggressive coalescing may have made it expanded the minimum period of time before the pipeline will allow an access to make its way past each stage in the pipeline.

    Vector export instructions have to be granted access to the export bus, which could be where back pressure from the ROPs can lead to CU stalls if, for example, EXP_CNT is set low or some kind pathological case leads to a build-up that exceeds its max count.
     
  5. Gipsel

    Veteran

    Joined:
    Jan 4, 2010
    Messages:
    1,620
    Likes Received:
    264
    Location:
    Hamburg, Germany
    That's of course possible. But this usually means one is bandwidth limited anyway. The ROPs should be able to buffer a few exports before they stop to accept new ones (or are they completely relying on the latency hiding of the shader cores?). That EXP_CNT gets decreased doesn't mean the export and the write to memory was carried out. It just means that the values are read from the registers and placed in the respective queue (so one can reuse the registers for something else). That means backpressure sets in only when the ROPs can't handle the exports fast enough, which is mainly a throughput thing.
     
  6. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    I'm not sure if the effect has been completely teased out.
    For example, Tahiti introduced a crossbar after the ROPs that fed to memory, which was responsible for the significant clock and bandwidth normalized improvement over Cayman.
    Localized contention for channels did pose a problem, although how much of the underutilization was due to localized bandwidth contention versus a possible limit to the number of pending operations the ROPs can buffer isn't clear.
    I think that it might be evidence of cases where bandwidth consumption can be uniform globally over long periods of time, but with short-term contention that the ROPs cannot buffer around.

    The rate that the eSRAM can send and receive data on a per-controller basis would be higher, assuming each 8 MB block has its own controller. The reduced amount of contention for those controllers and the reduced latency would require less buffering, which apparently the ROPs aren't that good at.
     
  7. oldschoolnerd

    Newcomer

    Joined:
    Sep 13, 2013
    Messages:
    65
    Likes Received:
    8
    Thanks.

    Just signed up to post this, and I have had a beer, but ... I have just got to say thanks for the wicked thread. I have spent at least 14 hours reading this ... Compared to the rest of the internet, you guys are right up there. You know who you are. Cheers.

    Onto the subject at hand. I think the Xbox One hardware looks absolutely sweet. Elegant. It's been 25 years since I got to metal on the Amiga, but I would love to get into this bad boy. Those of you getting paid to work on this...nice one.

    At the end of the day it's going to come down to the software. If the api's are on the money...it will fly.

    Sorry for the lack of technical detail...
     
  8. Brad Grenz

    Brad Grenz Philosopher & Poet
    Veteran

    Joined:
    Mar 3, 2005
    Messages:
    2,531
    Likes Received:
    2
    Location:
    Oregon
    And that's still a lot faster than having to flush the whole cache which would take something like 4K cycles on Xbox One, no?
     
  9. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    In many cases, I believe so, but that's not relevant to the point I was discussing, which is whether that can be used as a hint about the memory pipeline's contribution to memory latency without the external DRAM.

    That's only the fixed initial cycle cost.
    The variable latency component is dependent on how much needs writeback and the speed of the bus used.
     
  10. Betanumerical

    Veteran

    Joined:
    Aug 20, 2007
    Messages:
    1,763
    Likes Received:
    280
    Location:
    In the land of the drop bears
    4096 to eSRAM to be exact.
     
  11. Gipsel

    Veteran

    Joined:
    Jan 4, 2010
    Messages:
    1,620
    Likes Received:
    264
    Location:
    Hamburg, Germany
    Add the latency to that figure to be really exact. ;)
     
  12. HeLL

    Newcomer

    Joined:
    Apr 1, 2013
    Messages:
    7
    Likes Received:
    0
  13. Brad Grenz

    Brad Grenz Philosopher & Poet
    Veteran

    Joined:
    Mar 3, 2005
    Messages:
    2,531
    Likes Received:
    2
    Location:
    Oregon
    Power supply ratings to not imply actual system power consumption. Xbox One likely consumes far less than 253 Watts in actual operation.
     
  14. Michellstar

    Regular

    Joined:
    Mar 5, 2013
    Messages:
    662
    Likes Received:
    380
    Well it has to power the soc itself, ram, HDD, Bluray drive, southbridge, and Kinect
     
  15. TheAlSpark

    TheAlSpark Moderator
    Moderator Legend

    Joined:
    Feb 29, 2004
    Messages:
    22,146
    Likes Received:
    8,533
    Location:
    ಠ_ಠ
    Insignificant next to the power of the [strike]sauce[/strike] SoC.


    Anyways, the actual max DC output should be 12V*17.9A -> 214.8W, and 5W standby. The 253 number is a bit weird (Watt-hour i.e. joules). Even stranger that it's next to Spanish (?) when the rest of the label is in English and Chinese.
     
  16. SlimJim

    Banned

    Joined:
    Aug 29, 2013
    Messages:
    590
    Likes Received:
    0
    that is the maximum power rating, there is headroom because after use, every year the powersupply will drop in maximum output.

    edit: unless you were hinting at... undocumented extra parts?.. because in that case you were pretty discreet ...:cool2:
     
  17. Michellstar

    Regular

    Joined:
    Mar 5, 2013
    Messages:
    662
    Likes Received:
    380

    Yes the main contributor is the SOC, but we can´t dismiss Kinect draw, until Slim models Kinect required an additional power conector.

    If the label is legit and not a Chinese clon, i guess is meant for Mexico, Input 100-127v (I think they use 110v at home)
     
  18. TheAlSpark

    TheAlSpark Moderator
    Moderator Legend

    Joined:
    Feb 29, 2004
    Messages:
    22,146
    Likes Received:
    8,533
    Location:
    ಠ_ಠ
    Never seen star wars have you. :(

    Yes. Kinect's power adapter even specifies 12V, 1.08A max output, though K1 does have motors.

    The sensor block diagram from hotchips doesn't seem to give the impression of any high power consuming component though. *shrug*

    It was just funny to see 2 lines out of them all have a third language.
     
  19. Michellstar

    Regular

    Joined:
    Mar 5, 2013
    Messages:
    662
    Likes Received:
    380
    Man i don´t get the SW joke :(
     
  20. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    44,106
    Likes Received:
    16,898
    Location:
    Under my bridge
    You mean Spaceballs, right?
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...