esram astrophysics *spin-off*

Discussion in 'Console Technology' started by astrograd, Aug 3, 2013.

Thread Status:
Not open for further replies.
  1. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    The quoted peaks and "real-world" performance figures in the DF article are not described very well and the measurement methods are not disclosed.
    The quoted scenario from the source is heavy on memory traffic and in theory it was supposed to be an illustration of a good use case, so I'm not sure why it would be this twitchy.

    The fact that the documentation doesn't label the interface as 2x what is in the diagram is curious because there's nothing wrong with giving peak figures that assume no banking conflicts and a frequently unrealistic 1:1 read/write ratio. The lack of it is usually consistent with the top numbers being a more restricted use case, and the lack of detail on the measurement method means we can't rule out a range of common errors when it comes to benchmarking complex memory pipelines that will try to prefetch, buffer, and coalesce whatever they can.

    The picture from the leaks is an incomplete one, so I'm awaiting something with more detail--preferably more direct than an anonymous source that is being passed along second and third hand with non-technical parties trying to interpret it before passing it along.
     
  2. Rockster

    Regular

    Joined:
    Nov 5, 2003
    Messages:
    926
    Likes Received:
    39
    Location:
    On my rock
    I don't know much about memory controllers so going to ask a stupid question. Would the ROPS be wired directly to both eSRAM and the path to DDR3? Common sense tells me no. And that it would go through some arbiter/switch which would likely be buffered. Could the 192GB/sec simply be referring to the sum of the external bandwidth from ROPS to said controller, are that certain read/write/mod ops can occur directly against the write buffer without traversing the 102GB path to eSRAM?
     
  3. astrograd

    Regular

    Joined:
    Feb 10, 2013
    Messages:
    418
    Likes Received:
    0
    read + write = 2 ops

    That gives you 7(2) + 1(1) = 15 ops per 8 cycles.

    2 ops for all 8 would be twice the baseline BW (2(109) = 218 GB/s) and correspond to 16 ops per 8 cycles.

    15/16 of 218GB/s = 204GB/s peak.

    Since you can't really do read + write on all cycles it's not really appropriate to say it's 204GB/s peak and just leave it at that. So MS noted the baseline min of 109GB/s as well as the theo. peak of 204GB/s.
     
  4. Rockster

    Regular

    Joined:
    Nov 5, 2003
    Messages:
    926
    Likes Received:
    39
    Location:
    On my rock
    I understand the math. However, I think you're making a leap of faith in assuming that they've somehow, originally unknowingly, created a part that can truly read and write in a single cycle. IMO, it's more likely that the 204GB figure is an effective bandwidth rating for a particular sequence of data access where unnecessary ops are skipped due to some flag/signaling.

    Sorry Shift. Started composing prior to your note. This is done. Thx.
     
  5. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    That leaves the question of why it has an 7/8 pattern. Access activity still occurs in step with the cycle-bound state stransitions, so what would be carried over to make the 8th cycle different?
    The story seems like it would be more complicated because the memory subsystem would have little trouble filling in read and write traffic for a higher sustained amount unless there were other restrictions in play.

    I'm not sure what would be stolen, since it looks like the GPU memory system is the sole user of the interface.
     
    #125 3dilettante, Aug 26, 2013
    Last edited by a moderator: Aug 26, 2013
  6. Andrew Lauritzen

    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,526
    Likes Received:
    454
    Location:
    British Columbia, Canada
    Right this seems like the most sensible conclusion to me. Not directly related, but Haswell GT3e's eDRAM works similarly in terms of separate read/write bandwidth.
     
  7. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    Can you expand on how that works? I thought there were dedicated bus paths in each direction. Is there a penalty cycle in there that pauses one of the paths after a certain number of ops?
     
  8. astrograd

    Regular

    Joined:
    Feb 10, 2013
    Messages:
    418
    Likes Received:
    0
    It seems concrete to me. Think of it like a prediction made by a certain hypothesis that was verified at Hot Chips. Sure, other hypotheses could somehow lead to a similar outcome, but none were presented afaik. The mechanism for how it can read/write simultaneously on some of the cycles is still unknown, but the DF article seems very much concretely vindicated here (once adjusted for clock adjustments).

    If nothing else I think it is worth noting the speculation as it provides some warning about how especially unlikely the peak BW figure might be in terms of actually getting close to it in games.
     
  9. DrJay24

    Veteran

    Joined:
    May 16, 2008
    Messages:
    3,891
    Likes Received:
    633
    Location:
    Internet
    You used the number from DF to form the hypothesis and the number from today to validate it? That is some circular logic.
     
  10. Hardknock

    Veteran

    Joined:
    Jul 11, 2005
    Messages:
    2,203
    Likes Received:
    53
    Well this is disapointing. So they only cite one example and that example can only reach 133GB/s?
     
  11. jayco

    Regular

    Joined:
    Nov 18, 2006
    Messages:
    848
    Likes Received:
    81
    I'm sorry to ask this question because i think it has already been answered but, how they can obtain a BW of 204GB/s without a dual bus? They clearly state 204GB/s in the presentation, not 102GB/s nor 133GB/s.
     
  12. McHuj

    Veteran Regular Subscriber

    Joined:
    Jul 1, 2005
    Messages:
    1,432
    Likes Received:
    553
    Location:
    Texas
    I assume you'd only be able to get full bandwidth if you were doing something extremely simple like a memcopy.
     
  13. Bagel seed

    Veteran

    Joined:
    Jul 23, 2005
    Messages:
    1,533
    Likes Received:
    16
    Without further clarification, yes. A bit like saying my Camry can do donuts. But really only while on an ice rink with some people pushing one corner. It's not a common situation.
     
  14. Cyan

    Cyan orange
    Legend Veteran

    Joined:
    Apr 24, 2007
    Messages:
    8,572
    Likes Received:
    2,292
    Some renowned tech sites had been claiming time ago that the 200GB/s figure was rather creative thinking on Microsoft's side 'cos they were allegedly combining the bandwidth of the CPU-GPU connection -30 GB/s-, the eSRAM bandwidth -102GB/s- and the DDR3 numbers -68GB/s- to obtain 200GB/s.

    30 + 102 + 68 = 200GB/s.

    But those numbers have changed and don't apply anymore. I am certain that the presentation was mentioning the eSRAM specifically when they talked about 204GB/s of peak bandwidth.

    Other than that, it's quite surprising that the console is going to use 8GB flash memory, which means it can be utilised as a cache, and popping will be a goner. The system looks like a very neat platform.

    In my opinion, the Xbox One is one hell of a console. Technical beauty.
     
  15. Strange

    Veteran

    Joined:
    May 16, 2007
    Messages:
    1,418
    Likes Received:
    40
    Location:
    Somewhere out there
    [strike]Focusing on the eSRAM, we have from the slides, 3 pieces of information:

    1. 109GB/s min
    2. 204GB/s peak
    3. 4*256bit read and write.

    109GB/s obviously is 800Mhz/15*16 bump resulting in 853.3Mhz and then *128 bits = 109.226 GB/s.

    204GB/s, however, is still a mystery.
    it does, however, support the 7/8 cycle read+write theory, as 109.226/8*15 = 204.8 GB/s.

    The old 109.226 + 68 + 30 internal bandwidth that they used @ E3 this time gives us 207. This number is hard to confuse with 204GB/s so that shouldn't be the case this time.




    However, the third line may shed some light into the mystery perhaps...?

    => 4*256bit read and write

    The eSRAM seems to being divided into 4 8MB blocks, each with a 256 bit interface.

    However, in our calculations, we (and they) have always used 800(853)Mhz * 128 bits instead of the full interface of one 8MB esram block, which is specified here to be 256bits.

    Could there be a possibility that if all the read/write occur in a special pattern (i.e. all done on one block instead on touching several blocks, or copying between blocks or some creative usage) you could then achieve the magical read and writing on 7/8 of the cycles?

    That may also explain how this peak is really unobtainable and realistically it ends up in ~133GB/s for some tests.[/strike]

    Edit: Trash that.

    Got bits/bytes mixed up.

    4 blocks at 256/blocks = 1024 bits = 128 bytes/cycle. So there's still some sort of fishy thing unexplained.
     
    #135 Strange, Aug 27, 2013
    Last edited by a moderator: Aug 27, 2013
  16. Rockster

    Regular

    Joined:
    Nov 5, 2003
    Messages:
    926
    Likes Received:
    39
    Location:
    On my rock
    You're mixing bits and bytes. Interface has always been assumed to be 4x256bit or 1024 bits wide. 1024 bits / 8 = 128bytes per cycle. 853Mhz * 128 bytes = 109GB/sec.
     
  17. Rockster

    Regular

    Joined:
    Nov 5, 2003
    Messages:
    926
    Likes Received:
    39
    Location:
    On my rock
    The 109 min to 204 peak is far from explained. If it were as simple as a 7/8 clock penalty, I think they would say that. Still feel there are for more esoteric requirements to exceed 109GB/sec.
     
  18. Strange

    Veteran

    Joined:
    May 16, 2007
    Messages:
    1,418
    Likes Received:
    40
    Location:
    Somewhere out there
    Ah true, didn't notice bit/byte problem, then trash that lol


    Then it still remains unexplained.
     
  19. Andrew Lauritzen

    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,526
    Likes Received:
    454
    Location:
    British Columbia, Canada
    No I didn't mean it's similar to that level. I just meant separate read and write, therefore stating the "peak" as roughly twice the "min" bandwidth is probably fairly reasonable for graphics, which is what I assume they are doing. The "not perfect double" bit I don't know... could be a number of things I imagine.

    Haswell's eDRAM is somewhat more complicated than simple read/write memory paths, but I'm not sure how much has been publicly disclosed. I think Marco (nAo) talked about it a bit HPG 2013 though, so maybe there's some slides floating around. That'd be a topic for another thread in any case.
     
  20. Cyan

    Cyan orange
    Legend Veteran

    Joined:
    Apr 24, 2007
    Messages:
    8,572
    Likes Received:
    2,292
    Some really interesting tidbits from this excellent article:

    http://www.extremetech.com/gaming/1...d-odd-soc-architecture-confirmed-by-Microsoft

    I didn't know the eSRAM has been broken into four 8MB chunks. I wonder what are the implications of this. 8MB is just a very small amount of memory in order to fit a full 1080p framebuffer, isn't it? :eek:
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...