Xbox One (Durango) Technical hardware investigation

Discussion in 'Console Technology' started by Love_In_Rio, Jan 21, 2013.

Thread Status:
Not open for further replies.
  1. McHuj

    Veteran Subscriber

    Joined:
    Jul 1, 2005
    Messages:
    1,613
    Likes Received:
    869
    Location:
    Texas
    Well, there goes that theory, so outside of the SRAM and associated DMA's, it looks like a vanilla GCN GPU then?
     
  2. dumbo11

    Regular

    Joined:
    Apr 21, 2010
    Messages:
    440
    Likes Received:
    7
    The original leak seems to indicate video codecs, audio codecs and 'Kinect multichannel echo cancellation (MEC)'.
     
  3. Hecatoncheires

    Newcomer

    Joined:
    Jan 11, 2013
    Messages:
    179
    Likes Received:
    0
    Every off-the-shelf GPU has fixed funtion elements, for example the ROPs.
     
  4. anexanhume

    Veteran

    Joined:
    Dec 5, 2011
    Messages:
    2,078
    Likes Received:
    1,535
    Yup, exactly what Orbis has. Also some hardware dedicated to whatever these display planes are.

    (please note, I mention Orbis not to draw a comparison, but to show that similar architectures would likely result in similar fixed function blocks)
     
  5. Gipsel

    Veteran

    Joined:
    Jan 4, 2010
    Messages:
    1,620
    Likes Received:
    264
    Location:
    Hamburg, Germany
    The L2 associativity is supposed to be 8 way in Durango and 16way for a usual GCN GPU. But I actually don't know if that may be already part of the adaption to the eSRAM and the mem controllers in the SoC. Or maybe it even differs for the GCN models (i.e. Tahiti has 16way, CapeVerde just 8way), i don't know. In any case, it's nothing major as it is basically already outside of the core GCN architecture.
     
  6. Xenio

    Regular Banned

    Joined:
    Jan 18, 2013
    Messages:
    447
    Likes Received:
    0
    so they are calling "accelerators" something that already are in all the gpu's? doesn't make any sense to me
    of course they will be audio acc., video acc. those are the blocks from the leaked diagram, but aegies and others says that this is accurate but incomplete, I wonder when we will have a complete idea of the whole structure, anyway durango seems a complex, and long thinked, system to me.
     
  7. anexanhume

    Veteran

    Joined:
    Dec 5, 2011
    Messages:
    2,078
    Likes Received:
    1,535
    Vgleaks hasn't finished dumping their supposed knowledge about Durango.

    According to this poll, they have more to reveal on the audio blocks, display planes, kinect, memory architecture, video compression and more.
     
  8. dumbo11

    Regular

    Joined:
    Apr 21, 2010
    Messages:
    440
    Likes Received:
    7
    AFAICT display planes are different 'output buckets' that can be used for anything from multiple outputs to overlays...

    Given what we suspect about Durango "overlay" is a very good bet. For example:
    - 1 display pane for apps.
    - 1 display pane for the game.
    - 1 display pane for video (the HDMI input and/or any other source).

    Then some magic pixie dust to overlay them in the correct order, and resize them before sending it to the HDMI out. (e.g. it might allow your TV program to run with a twitter feed down the left, or maybe a game installation window in the top right etc, and when gaming it might show you the current program along with a video chat window).
     
  9. Ketto

    Newcomer

    Joined:
    Jul 30, 2012
    Messages:
    39
    Likes Received:
    0
    Location:
    Winter Park, Florida; and London UK.
    The DMEs sound cool I guess.
     
  10. Hecatoncheires

    Newcomer

    Joined:
    Jan 11, 2013
    Messages:
    179
    Likes Received:
    0
    The DMEs may be able to speed up data transfer between the embedded RAM pool and the DDR3 RAM pool but then again I can't see where this is an advantage when the competitor uses a single RAM pool with almost three times the bandwidth. Sounds like damage control to me.
     
  11. anexanhume

    Veteran

    Joined:
    Dec 5, 2011
    Messages:
    2,078
    Likes Received:
    1,535
    Damage control is a quick response to an accident. A planned custom functional block is not a quick response. It's cost reduction, not damage control.
     
  12. Xenio

    Regular Banned

    Joined:
    Jan 18, 2013
    Messages:
    447
    Likes Received:
    0
    seems to me that you misunderstand what the DME's are here for

    this is what the 4 orbis CU can be used for, if they have another path to ram, of course as orbis have no a different ram block (eSRAM or whatever)
     
  13. SKYSONY

    Newcomer

    Joined:
    Jun 12, 2012
    Messages:
    131
    Likes Received:
    0
    You really sound like Mistercteam now. DMEs have been explained and any forumer has said what you are saying. DMEs are there to compensate the lack of bandwidth of the DDR3.

    http://www.neogaf.com/forum/showpost.php?p=47375749&postcount=127
     
  14. Hecatoncheires

    Newcomer

    Joined:
    Jan 11, 2013
    Messages:
    179
    Likes Received:
    0
    As far as I understand the embedded RAM is there to compensate the lack of bandwidth and the DMEs are there to compensate the lack of computing power. :wink4:
     
  15. Xenio

    Regular Banned

    Joined:
    Jan 18, 2013
    Messages:
    447
    Likes Received:
    0
    sky*sony* before you take an offence to another member, think twice.

    and NO, DME's are here to compute functions that in other gpu's are done by shaders, plus those units can operate when gpu is stalled computing-side or BW-side
    be less rude and learn to read
     
  16. mrcorbo

    mrcorbo Foo Fighter
    Veteran

    Joined:
    Dec 8, 2004
    Messages:
    4,024
    Likes Received:
    2,851
    The DMEs are what ensure that the GPU doesn't become bandwidth limited (by facilitating tiling and otherwise making sure data is placed where the GPU can access it quickly enough to prevent the compute units from being starved for data) and also to allow useful work to be performed with any leftover bandwidth available when the GPU and CPU aren't using all of the bandwidth of both pools.

    This fits with my suspicion that the custom hardware in Durango is intended to improve utilization of the available system resources, not provide more.
     
  17. patsu

    Legend

    Joined:
    Jun 25, 2005
    Messages:
    27,709
    Likes Received:
    145
    Are the rumored 4CUs different, and more flexible/efficient than the 14CUs ? If so, they can be used independently to prep the data for the other 14 to churn. It may help to lower bandwidth and compute load depending on how the devs use them. It's similar in idea to the SPUs. I guess they can also throw all 18 to work on the same thing under "ideal" condition.

    The DMEs are something else altogether. But I am curious how they compare to the 2 DMA units in equivalent AMD GPU (besides JPEG compression and number of units)

    I also think ideally, MS would want their GPU to be closer to GCN2 to be more efficient but I haven't read up enough on it yet. ^_^
     
  18. Xenio

    Regular Banned

    Joined:
    Jan 18, 2013
    Messages:
    447
    Likes Received:
    0
    VGLeaks says
    as the tasks computed by DME's are often loaded into the shaders' work on classic GPU's
     
    #838 Xenio, Feb 6, 2013
    Last edited by a moderator: Feb 6, 2013
  19. tunafish

    Regular

    Joined:
    Aug 19, 2011
    Messages:
    627
    Likes Received:
    414
    The primary use I can see is for efficient virtual texturing. The ability to get 4 Mpixels worth of JPEG turned into properly swizzled and prepared textures per frame without having to use any of the primary processing for it is nothing to scoff at for modern virtual texturing engines.
     
  20. liolio

    liolio Aquoiboniste
    Legend

    Joined:
    Jun 28, 2005
    Messages:
    5,724
    Likes Received:
    195
    Location:
    Stateless
    All this make me wonder about how the GPU sees the two pools of ram (main RAM and the scratchpad). I was assuming (and some others too) that the move engines role were to have the GPU to see those 2 pools of RAM (/fake it).

    The data provided by VGleaks seems to imply that the GPU by self is able to deal with those 2 pools of RAM without having to resort to the DME.

    The data are a bit confusing to me as they use "shader" for the GPU, it is unclear how much bandwidth the "shader cores" /SIMDs or CUs or the ROPs have to the scratchpad memory.


    I've to say I'm a bit lost, can the CPU access the scratchpad, would that even help in some way?
    Instead of pre-fetching data using the CPU cores, you set the DMEs to gather (or scatter) data from the main ram. I would think that the latency to read or write from the CPU to that pool of memory would be (too) high but if you can have the CPU to pre-fetch from the scratchpad could it "work".

    I think of those big data structures used for example by epic in UE4, could it be possible to have them compressed in RAM, to load the relevant parts in the scratchpad memory (it would remained compressed) then on request from the CPU to stream and uncompress the data (on the fly) to the CPU cache?
    (I wonder the same about virtual texturing, or data structures used by Kinect).

    If it is used by the CPU (too) the 25.6GB of bandwidth is less of an issue because no matter how the system is put together (1 or 2 chips) I don't expect the CPU to be able to suck that much more bandwidth.

    Another idea is could the CPU set command for the GPU in the scratchpad, the data would be compressed on the fly, uncompressed on the fly by the GPU when it reads it?

    Another thing is tessellation, I remember reading something about previous AMD GPU, the GPU had to dump data to the RAM (depending on the level of geometry amplification). Could it be a win if the GPU could dump data (compressed on the fly) in the scratchpad.

    It would interesting if the scratchpad is not use "overall" by the whole system(and mostly the GPU) as a monolitic place used mostly to "render" (/deal with bandwidth intensive operations like blending) but is used in many different ways as a buffer by the units within the system (I mean a general purpose scratchpad memory).

    What compression ratio can we expect? Or actually "how big" the 32MB memory could be made?

    EDIT

    Another question that raises into my head is the amount of RAM and CPU cores supposedly reserved to run the "OS", it sounds like quiet a lot to me.
    I wonder if part of that reservation could be made to run the "API" (/ system level driver) Edge spoke about, could a core or more (as well as some memory) could be used to deal with DMEs and which "system" in your program (/game) uses and in which amount the scratchpad.
    I've no idea here just wondering (though my wording is unclear) if out of this resources ( a lot going by the rumors) a lot it used to make the Scratchpad (with more than often only compressed content) to act pretty transparently as what could be a big L3 for the whole system (CPU and GPU). It could stream in advance stuff in the reserved RAM (be it virtual texture, Data structure holding voxel, kinects, and /what not) and then try to put the stuff in time in the scratchpad for either the GPU or the CPU to use?
     
    #840 liolio, Feb 6, 2013
    Last edited by a moderator: Feb 6, 2013
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...