Understanding XB1's internal memory bandwidth *spawn

Discussion in 'Console Technology' started by zupallinere, Sep 11, 2013.

  1. Ceger

    Newcomer

    Joined:
    Aug 21, 2013
    Messages:
    59
    Likes Received:
    1

    [​IMG]

    Thank you for putting into words what most wanted to say but have had the misfortune of not saying. You bridged the gap of dispute between many members.
     
    #341 Ceger, Oct 7, 2013
    Last edited by a moderator: Oct 7, 2013
  2. Pixel

    Veteran

    Joined:
    Sep 16, 2013
    Messages:
    1,008
    Likes Received:
    477
    I know all about the flash, I was arguing with people that it would be used for OS functionality to enable instant switching because the hdd needs to be free to facilitate unimpeded data transfer for games and the OS needs to be able to have unimpeded OS program loading to facilitate MS multitasking vision.

    Back to the gpu architeecure and Boyd Multerers commemnt:
    Beyond the dual porting nature of certain varieties of 6t sram (and x1s 6t) and higher types, that screenshot made me wonder if the latency advantage was another reason they put high realestate sram and not edram like the ibm power chips. There may be modifications to the architecture so there is some benefit to the low latency esram. They could have tripled their embedded cache size with edram (not hexuple despite dram only having a single transistor per bit).

    Despite good predicting and precaching certain types of gpu operations that lend themselves to low latency such as RoP operations?
     
    #342 Pixel, Oct 7, 2013
    Last edited by a moderator: Oct 7, 2013
  3. Alucardx23

    Regular

    Joined:
    Oct 7, 2009
    Messages:
    549
    Likes Received:
    81
    Beautiful example. :wink:
     
  4. dumbo11

    Regular

    Joined:
    Apr 21, 2010
    Messages:
    440
    Likes Received:
    7
    http://www.eurogamer.net/articles/digitalfoundry-the-complete-xbox-one-interview

     
  5. Betanumerical

    Veteran

    Joined:
    Aug 20, 2007
    Messages:
    1,763
    Likes Received:
    280
    Location:
    In the land of the drop bears
  6. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    44,106
    Likes Received:
    16,898
    Location:
    Under my bridge
    Nah. He resorted to a car analogy which carries a -10 point penalty.

    Incidentally, do car fanatics use computers for analogies when talking about their motors?
     
  7. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    430
    Location:
    Cleveland, OH
    Couldn't the "real world" bandwidth numbers have been measured using performance counters on the GPU?
     
  8. Ceger

    Newcomer

    Joined:
    Aug 21, 2013
    Messages:
    59
    Likes Received:
    1
    I can certainly give that a shot. Let me get back to you on that. ;)
     
  9. Gipsel

    Veteran

    Joined:
    Jan 4, 2010
    Messages:
    1,620
    Likes Received:
    264
    Location:
    Hamburg, Germany
    Just checked, somewhat recent AMD GPUs offer performance counters named CBMemRead and CBMemWritten, so that is a possibility. But as explained (also by bkilian with his car analogy), a normal game won't sustain 150GB/s over longer periods (i.e. 1 frame or even half of it). You need something closely resembling a fillrate test or one will have just bursts of this bandwidth use. I mean, it would be something like 1.2 kB per pixel at 1080p60. What are you supposed to do with that much of data per pixel? Especially as MS has said they got this for blending operations? If they did, they did not much else which means it was basically a fillrate test (at least for the period they measured).
     
  10. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    430
    Location:
    Cleveland, OH
    Nick Baker didn't say anything about doing blending operations, much less a blending fillrate test for this measurement. All he said was "that is real code that is running at that bandwidth." He didn't say that it was the average over the course of the frame, it's unclear what the time period is. I think all he really needs to demonstrate is that the period is long enough that it could have been indefinite if that's what the game's workload actually mandated.

    .
     
    #350 Exophase, Oct 7, 2013
    Last edited by a moderator: Oct 8, 2013
  11. Gipsel

    Veteran

    Joined:
    Jan 4, 2010
    Messages:
    1,620
    Likes Received:
    264
    Location:
    Hamburg, Germany
    And that says exactly nothing. A fillrate test also executes real code. Otherwise it wouldn't run. The blending stuff comes from the earlier MS statement on the matter and it is the kind of code to run to demonstrate high bandwidth use. It's a natural fit.
     
    #351 Gipsel, Oct 7, 2013
    Last edited by a moderator: Oct 8, 2013
  12. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    430
    Location:
    Cleveland, OH
    He explicitly said it in contrast with a synthetic test or diagnostic. I'm pretty sure that when he said "real code" he didn't just mean any code period, what exactly would be the point of that? Of course it's code. The strong implication is that it comes from a game.
     
  13. taisui

    Regular

    Joined:
    Aug 29, 2013
    Messages:
    674
    Likes Received:
    0
    A: this is the theoretical peak at 204...
    B: that's just theoretical, what's the real blah blah blah
    A:; real number we measure at 150...
    B: well you could be using test code, not game code blah blah blah blah
    A: we measured Forza at 150...
    B: well that's a racing game, what about different genre blah blah blah
    A: Ryse is at the same range...
    B: Crytek's a big shop studio, little guys can't do it blah blah blah
    A: well we have this indie developer....
    B: he's a genius, that doesn't count for the common case blah blah blah

    I just feel I had to get this out. If you have problem believe the Earth is round, that's really just your problem.
     
  14. liolio

    liolio Aquoiboniste
    Legend

    Joined:
    Jun 28, 2005
    Messages:
    5,724
    Likes Received:
    195
    Location:
    Stateless
    As a side be it 135 or somewhere between 140 and 150 GB/s I don't find the figure hard to swallow, actually it is not that high for on chip memory.
     
  15. Gipsel

    Veteran

    Joined:
    Jan 4, 2010
    Messages:
    1,620
    Likes Received:
    264
    Location:
    Hamburg, Germany
    I'm still not sure what you are trying to discuss. Of course one can use a code fragment or a short period (a single millisecond) from a game, where it basically resembles a fillrate test (with blending). If you want to quantify the usable bandwidth in any way (and they talk about bandwidth all the time), you look for these situations or construct testcases for it. How else should it work?
    And btw., while you are right, that Nick Baker didn't explicitly mention blending at that point, Andrew Goossen did right after that passage of Nick you referred to:
    I think I used the phrase "natural fit". :roll:

    To sum it up, I don't get your point. What do you want to disuss?
     
  16. adev

    Newcomer

    Joined:
    Oct 2, 2013
    Messages:
    35
    Likes Received:
    0
    I find the back and forward on the ESRAM bandwidth amusing.

    204Gbps theoretical bandwidth to 32Mb of memory makes it look like the least likely place to suffer a bottleneck, even if only 130-150 is actually usable.

    The other pool only has 68Gbps bandwidth. If you want a decently performing AAA game you're going to have render buffers in ESRAM irrespective of its performance because it's clearly better than not using it. If you end up fill or ROP limited you'll be reducing overdraw, blending and resolution as much as possible. It's not necessarily the screen res which has to suffer either, I've worked on games which have been able to optimise this type of situation by reducing cube map render buffer resolution.

    It's hard to look at any of this in isolation. I can't imagine that the One's designers didn't anticipate a fill rate issue given the hardware display plane scaling and blending.
     
  17. Hornet

    Newcomer

    Joined:
    Nov 28, 2009
    Messages:
    120
    Likes Received:
    0
    Location:
    Italy
    Now that we know pretty much all high-level details on Xbox One, I think it would be valuable to focus the discussion on workloads and the most interesting ways to use the ESRAM. There are many questions, which I guess will have different answers depending on the game, but interesting nonetheless.
    1) When using multiple render targets, is it suitable, from a bandwidth perspective, to put some of the render targets in the DDR3 pool? Or is the DDR3 bandwidth going to be just enough for CPU, geometry and textures?
    2) Do all render targets in a deferred renderer require roughly the same amount of read/writes? Or is it possible to pick high read/write render targets and store only them in the ESRAM?
    3) How does Forward+ compare to Deferred rendering in terms of render targets size and bandwidth requirements?
    4) Is tiling going to be more or less costly than in the previous generation? Which techniques can be used to reduce the cost of tiling?
    5) Is it suitable to split render targets in areas with high and low overdraw, in order to determine which pages should be put in ESRAM and which ones should stay in DDR3? Or is it too variable from frame to frame?
    6) Will it be common for multiplatform game to use the majority of PS4 bandwidth for texturing or geomery, requiring ports to use the ESRAM as an asset cache rather than as a target for the ROPs?
    7) Will the flexibility of ESRAM compared to the Xbox 360 EDRAM provide any advantages in things like shadow rendering? Or is the ESRAM size too small for this?
     
  18. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    430
    Location:
    Cleveland, OH
    As far as I'm concerned, "using a section that resembles a fillrate test" and "running a fillrate test" are not the same thing. The important distinction is that if you're writing a synthetic test you can be doing something totally unrealistic that the GPU likes a lot more than anything realistic. Maybe it's representative, maybe it isn't. A game is at least representative of something.

    Still doesn't mean that the test measurement figure was exclusively over a burst using blending. I'm sorry if I'm not giving you enough of a discussion here by saying I think you're assuming too much, because that's all I'm doing.
     
  19. Gipsel

    Veteran

    Joined:
    Jan 4, 2010
    Messages:
    1,620
    Likes Received:
    264
    Location:
    Hamburg, Germany
    A fillrate test is also representative of something. :lol:
    Furthermore, each game or even each different phase of rendering a frame of the same game will be representative of something else, a something we don't know.
    But here, they looked specifically for high bandwidth usage situations, so we are given a hint, for what it should be representative, the bandwidth. ;)
    Of course it does. At least given the circumstances like what they are talking about (high bandwidth usage scenarios and which specific one [blending] they think of!). Given the reasoning in that interview as well as by several people here in the thread, it would be quite hard to explain how else they got 150+GB/s out of the eSRAM.
    If you refuse to assume some stuff, you will never arrive at some conclusion. Key is to make reasonable assumptions which are most likely fullfilled. ;)
     
  20. Ceger

    Newcomer

    Joined:
    Aug 21, 2013
    Messages:
    59
    Likes Received:
    1
    The problem people are having is that I'm not sure they are pulling this out fro a specific context. The viewpoint is that they provided average bandwidth hits of actual code, which is taken by all (except few here) as actual games. That creates a large variation in scenarios of what goes across the bandwidth at any given time. So everyone goes that it is a believable statement that bandwidths only really average a hit of 70-80% (MS should be quite knowledgeable especially as this is an evolution of their technology from the 360) and thus should be attributable to all bandwidths. But you keep pushing that there is no deifference between actual code and synthetic code (code produced to test isolated situations) and then support that GDDR5 hit over 90% based on synthetic code aspects.

    The problem people are having is specifically that they are not sure if you are saying that the ESRAM bandwidth ad DDR3 bandwidth for some reason are relegated to 70-80% but GGDR5 is not, or if you are just pointing out data simply as data presented.

    IF MS provided a bunch of synthetic tests that showed 90%+ of the bandwidth, would you then take that at face value or argue that it cannot achieve that? Again, are you simply talking data samples as provided or making assumptions that define the bandwidths in terms of what could only be called limitations of one over another?
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...