Xbox One (Durango) Technical hardware investigation

Discussion in 'Console Technology' started by Love_In_Rio, Jan 21, 2013.

Thread Status:
Not open for further replies.
  1. AlBran

    AlBran Ferro-Fibrous
    Moderator Legend

    Joined:
    Feb 29, 2004
    Messages:
    20,733
    Likes Received:
    5,825
    Location:
    ಠ_ಠ
    Cryptography would imply security... It's probably the ARM chip (TrustZone) that AMD announced plans to integrate awhile ago.
     
  2. Ethatron

    Regular Subscriber

    Joined:
    Jan 24, 2010
    Messages:
    859
    Likes Received:
    262
    There is nothing complicated in my "fantasy", IMO it is one possibility with very minor changes to the memory-controller and nothing else. The direct ESRAM-to-GPU is already more complicated, that could also have been done via the memory-controller if not for a reason.
     
  3. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,125
    Likes Received:
    2,885
    Location:
    Well within 3d
    Multi-banking seems like it could be possible. If we want the CPUs to overlap execution with data engine loads, they can switch to using banks not being filled or written back by a DME.
    Arbitration would be relatively simple, a bank is either ready or it's not.

    Maybe by assigning pages to specific cores and then pinning those address ranges to banks, the CPUs can avoid running into each other.
     
  4. Dr Evil

    Dr Evil Anas platyrhynchos
    Legend Veteran

    Joined:
    Jul 9, 2004
    Messages:
    5,777
    Likes Received:
    782
    Location:
    Finland
    RSX can read and write just fine into the XDR pool. Your post makes no sense. You make imaginary split of memory and then discard half of it...The games definitely use most of the available 512MB of memory in both consoles. Even if half of that would be used in non graphic related tasks, doesn't mean that the rest is not there or very important for the game.
     
  5. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,172
    Location:
    La-la land
    Yeah, sounds reasonable... Besides, that's pretty much what Intel's core i-series does also, except each CPU isn't hardwired to a set of banks of course, it has that ring bus that gives access to the cache of other cores as well.
     
  6. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,125
    Likes Received:
    2,885
    Location:
    Well within 3d
    The hash function used to assign addresses to L3 slices in the Corei series is programmable at some level, since Intel can disable an arbitrary core and slice for yield purposes. The goal there is to avoid contention for the same slice.

    Durango could do the same thing at a higher level with the SRAM, or by exposing the mappings to either the run-time or software, do the opposite of Intel and try concentrating accesses to actively used banks.
     
  7. AlBran

    AlBran Ferro-Fibrous
    Moderator Legend

    Joined:
    Feb 29, 2004
    Messages:
    20,733
    Likes Received:
    5,825
    Location:
    ಠ_ಠ
    @silhouette:

    This is a thread for the VG Leaks specs discussion, that's why your post was moved to the more general thread for rumour speculation/discussion, and the duplicate and triplicates were deleted.
     
  8. function

    function None functional
    Legend Veteran

    Joined:
    Mar 27, 2003
    Messages:
    5,135
    Likes Received:
    2,248
    Location:
    Wrong thread
    The 360 has 32 GB/s unidirectional BW between GPU and daughter die for its 8 ROPS. I'm guessing Durango might have 16 ROPS.

    32 GB/s * 2 * (800 mHz / 500 mhz) = 102.4 GB/s. Close enough to the 102 GB/s in the VGLeaks data.

    So ... how about 32MB of edram (maybe on a daughter die) for feeding 16 of dem magic ROPS, only with the ability to quickly swap bus direction for GPU / CPU reads of the edram (fixing the 360's limitation)?

    The 102 / 102.4 GB/s numbers just fit so well. Magic ROPS - with their "free" read-modify-write ops - might be just the kind of special sauce that Durango needs to see increased real world efficiency, and it would be bang in line with what's worked so well for MS before. And there'd be no need for developers to start learning about using fast local store for SPU style processing. It would just be 360 edram v2.

    Edit: you could even put a bank of magic ROPS in an edram daughter die and a bank of regular ROPS in the main GPU for use with main memory. Draw whatever you want to whichever pool of memory fits best.

    Edit 2: So render your huge double buffered dynamic shadow map straight into main ram while continuing to work on successive back buffers on the magic ROPS in edram. That kind of thing. So much special sauce potential.
     
    #488 function, Jan 27, 2013
    Last edited by a moderator: Jan 27, 2013
  9. Acert93

    Acert93 Artist formerly known as Acert93
    Legend

    Joined:
    Dec 9, 2004
    Messages:
    7,782
    Likes Received:
    162
    Location:
    Seattle
    Interesting. Good math deduction.

    That said if ERP is right and the 68GB/s DDR3 is supposed to be used for the ROPs... *O*U*C*H*
     
  10. function

    function None functional
    Legend Veteran

    Joined:
    Mar 27, 2003
    Messages:
    5,135
    Likes Received:
    2,248
    Location:
    Wrong thread
    Working out that maths just got my spider sense tingling.

    For "normal" reads and writes the BW would be seen as 102.4 GB/s as the leaked docs show it, but within the edram/sram any embedded ROPS that can perform read / modify / write (like on the 360) would have up to 204.8 GB/s. Following the full 360 way of doing things, 4X MSAA would take you up to something bonkers like 820 GB/s. - but even without that you're still looking at quite a good situation.

    MS were very keep to avoid putting everything on the same bus with the 360 as they said contention with high BW use framebuffers made performance unpredictable for everything else. Hopefully the Durange leaked specs are an indication that MS hasn't given up on smart ROPS and predictable performance.
     
  11. Acert93

    Acert93 Artist formerly known as Acert93
    Legend

    Joined:
    Dec 9, 2004
    Messages:
    7,782
    Likes Received:
    162
    Location:
    Seattle
    I may be naive, but a pool of memory for the framebuffer for that reason makes sense. One developer has noted a number of times that having predictable performance is a big deal to developers--one last thing to worry about. The naive part is if you have 7-32MB of memory (framebuffer) sucking 90% of the bandwidth of your 512MB of memory it seems, if it is economical, dedicating some silicon to the framebuffer to (a) give the wide bandwidth the framebuffer needs to go full bore ahead and (b) give more consistent bandwidth to the remainder of the system (even if the "peak" is lower than a completely unified system as it will often have more real bandwidth due to lack of framebuffer contention).

    So in theory the ESRAM for the framebuffer (+ other stuff) is not necessarily a horrible idea.

    That said ERP has been told MS is suggesting using the 8GB of DDR3 (68GB/s) for the framebuffer. That, right there, is another major indicator that MS is *not* targeting the same performance ballpark as Sony. If Sony hits 192GB/s that is the same as the 7870XT; but the normal 7870 (20CU) and 7850 (16CU) have 154GB/s so the is about 40GB/s floating around for the CPU even when the GPU is drawing the same bandwidth as the PC counterparts.

    IF the Durango framebuffer is in the DDR3 you have a small chunk of 7-64MB of space eating up the majority of the bandwidth of the 5GB of game space (not to mention any of the needs of the 3GB of system memory!) Maybe ERP or someone else can explain why this isn't an issue, but it does seem like a HUGE issue if your small framebuffer sucks up all your DDR3 bandwidth -- what is left over for all the other stuff the CPUs and GPU need to do? (As a snide comment @ MS: it seems they were happy with the crappy filtering on Xbox 360 games because I don't see how this situation will help address that issue).

    It seems for a similar budget MS had aimed a lot lower than Orbis in terms of *gaming* performance or there are some important, undisclosed information about the system.
     
  12. ERP

    ERP Moderator
    Moderator Veteran

    Joined:
    Feb 11, 2002
    Messages:
    3,669
    Likes Received:
    49
    Location:
    Redmond, WA
    The numbers are somewhat compelling for a 16 or 32 ROP part writing to the embedded memory, so I wouldn't rule out ROPS on the ESRAM.
    It's also quite possible that either the information I got was misunderstood, I don't have a first hand data source.

    In any case you'd almost certainly want to render Z and anything with blending enabled into the ESRAM, but most deferred renderers I know of do separate opaque and transparent passes.
    If you were running a deferred renderer you're only other option for the first pass would be to somehow split the render targets from the first pass between memory pools, or to tile.

    It's certainly true that rendering to the ESRAM would compete for bandwidth with the CPU, but the GPU is going to do that in either case.
     
  13. Acert93

    Acert93 Artist formerly known as Acert93
    Legend

    Joined:
    Dec 9, 2004
    Messages:
    7,782
    Likes Received:
    162
    Location:
    Seattle
    Thanks ERP.

    I started a thread with you in mind: http://forum.beyond3d.com/showthread.php?t=62932

    Oddly enough many, many months ago I had someone in PM claiming to know some 1st party developers and claimed Durango would use embedded memory and was adamant that what he was being told was that the peak compute would be lower but it would be very much like the NV/AMD flops situation. Specifically he said the developers were talking about wavefront utilization and avoiding parts of the GPU remaining idle. He has been PM'ing me ever since the VGLeaks telling me "he told me so" and pasting things the claimed developers are telling him. Either the PR machine is in full order or some developers like the design?
     
  14. Love_In_Rio

    Veteran

    Joined:
    Apr 21, 2004
    Messages:
    1,452
    Likes Received:
    110
    uh oh! One of my theories!. Out-of-order GPU!.

    By, the way, has this any sense?:

    We Say: Our source from our Sister Site has never let us down yet. We're leaning towards a yes on this one.

    Source Says: (09:33:46) Source:: Something they don't mention, though, is the new box will have TWO scaler chips... and what's cool about that is the UI can always be 1080p, while in-game rendering resolutions can scaled to increase perf as needed... two dynamic resolution planes.It's nifty.
    (09:34:29) Source:: It'll definitely be BC... definitely. But, will every game work? Probably not...
    Source: showtimeforfree.com


    Could be possible something like that? downscale to make the pixel and vertex rendering and upscale again without looking like simple upscaling?.
     
    #494 Love_In_Rio, Jan 27, 2013
    Last edited by a moderator: Jan 27, 2013
  15. kots

    Regular

    Joined:
    Oct 30, 2008
    Messages:
    394
    Likes Received:
    0

    So , they put hardware to render in low resolutions and then upscale it , but not looking like simple upscaling ?
    It keeps getting worse , no thanks !!
     
  16. AlBran

    AlBran Ferro-Fibrous
    Moderator Legend

    Joined:
    Feb 29, 2004
    Messages:
    20,733
    Likes Received:
    5,825
    Location:
    ಠ_ಠ
    Just sounds like overlays. :s
     
  17. inefficient

    Veteran

    Joined:
    May 5, 2004
    Messages:
    2,121
    Likes Received:
    53
    Location:
    Tokyo
    Ya, there are PS3 games that already do that. VirtuaFighter5 when the game is set to 1080p is the earliest example I recall noticing it on. And that with no dedicated hw scaler.Nifty sure. But hardly ground breaking.
     
  18. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,172
    Location:
    La-la land
    With GPUs being as multithreaded as they are there's really no reason for out of order execution...nor feasibility I suspect, since keeping track of the ~100ish instructions in flight at any one time on a CPU is very expensive in both power and transistors. Now think of a GPU, with thousands of instructions, tens of thousands for a large chip.

    Nah. Not happening, for a while yet at least. If ever. :D
     
  19. Xenio

    Regular Banned

    Joined:
    Jan 18, 2013
    Messages:
    447
    Likes Received:
    0
    I agree, too much stretch
     
  20. Xenio

    Regular Banned

    Joined:
    Jan 18, 2013
    Messages:
    447
    Likes Received:
    0
    I was thinking.. it's not possible to use esram with classic deferred rendering engines, but more modern tile-based deferred engines can do the magic, it uses screen tiles to group lights, with a tight tile frusta to cull non-intersecting lights (reducing the number of lights to consider)

    am I right?

    this was discussed byt Lauritzen at the siggraph 2010

    it's possible to tile in a fast 32 MB?
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...