Xbox One (Durango) Technical hardware investigation

Discussion in 'Console Technology' started by Love_In_Rio, Jan 21, 2013.

Thread Status:
Not open for further replies.
  1. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    It can be important for GPGPU, but it can also be handy for passing commands to the GPU for regular graphics loads with less overhead.

    The link itself is nothing new to APUs, which have always had a full-width non-coherent bus for the GPU to use all possible external bandwidth, and a more narrow coherent bus.
    It doesn't need to be as wide because we can see that the GPU would have no trouble overwhelming the cores with a fraction of its non-coherent bandwidth demands.

    It can be seen that the coherent path is sufficient to allow the GPU to read from one CPU cluster and the media/HDD blocks at full speed.

    Why not? GCN is a vast improvement over the prior GPUs, and it's narrowed the gap in ISA features and programming model as compared to SIMD CPU instructions that the room for improvement is modest.
    Half of the Orbis-only or Durango-only features the rumor mill has gushed about are actually standard GCN features both are likely to have.

    It's going to depend on the workload. The big consumer of bandwidth is obviously the GPU side, and the GPU is the unit that is most tightly paired to the ESRAM.
    Other ancilliary benefits are that moving some of the most onerous rendering traffic to the ESRAM are that the Jaguar modules can benefit from a DDR3 memory pool that operates at a smaller granularity than GDDR5, and it seems pretty likely that the system memory controller won't have to struggle as much in juggling graphics and CPU access patterns.
     
  2. liolio

    liolio Aquoiboniste
    Legend

    Joined:
    Jun 28, 2005
    Messages:
    5,724
    Likes Received:
    195
    Location:
    Stateless
    May I ask you your point of view about AMD VLIW4 architecture?
    I've come to the conclusion that it is a bit underrated as there were never a full line of products based on that architecture and as such it is no possible to do fair comparison between AMD last architectures.
    I think that it cost AMD quiet some transistor to get GCN on its feet, I still wonder as MSFT (esram aside) as a fairly conservative transistor budget for its GPU if going with AMD VLIW4 architecture woudl have been a better choice.

    The comparison is pretty much as such, 10 CUs GPU costs 1.5billions transistors, a 10 SIMDs vliw4 would cost 1 billions. Though GCN is likely to be a bit more dense (I think there is quiet some more memory cell in the design) but I don't think it would make up for the extra horse power going with AMD (mostly unused) previous architecture may have offer them.

    For me the massive win in GCN seems to be foremost compute, for graphics VLIW4 was nice and really efficient, AMD included nice improvement to ROPs, texture units, tesselators etc. and for" free" as far as silicon/transistor cost is concerned.

    I wonder about AMD choices (and how they translate perfs wise) for their next APUs, I' m not sure that the improvements in GCN (compute aside) are going to pay for themselves.
    Looking at the density of the gpu on GF 32nm process I 'm close to think that GCN is not "economical". As far as games are concerned if they do the switch to GCN without increasing the die size I would not be surprised if perfs go down.

    That VLIW4 architecture was imo pretty good, and AMD made strong push toward compute, Nvidia made the contrary... I think that a reworked VLIW4 could impress (games only) in perfs per mm^2.

    My last on the matter will be that NIntendo should have gone with that architecture, it is possibly the best bang for the bucks they could have, with trinity we saw 4 SIMD gpu part (vliw4) beat the high end llano model (5 SIMD vliw5 design) the clock being ~ the same. There were quiet some wins in that design.
     
  3. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    44,106
    Likes Received:
    16,898
    Location:
    Under my bridge
    Sure. I'm not writing full posts because I'm mid-conversation. The point is, if a 7770 can be made to run like a 680gtx with far lower power consumption and far lower cost to make, wouldn't AMD + nVidia be aware of this? It's the order of magnitude increase some are hoping for that seems extremely implausible. Whatever Tesla nVidia are making now, they could drop a loads of CUs, add 32MBs of SRAM cache, and triple performance while reducing cost and power draw. If the difference in performance is that great, how come they aren't doing it?! They can't be that ignorant. Ergo, 32 MBs SRAM can't be all it takes to triple performance of a GPU computer architecture. Logic tells us that any improvement will come with a corresponding trade-off - there's no magic bullet. There's never a magic bullet. Every time a magic bullet solution is raised on these boards, it's always proven to be bunk. And turning a 7770 into a 680gtx by adding SRAM is such a mythical magic bullet super modification that makes no sense.
     
  4. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    44,106
    Likes Received:
    16,898
    Location:
    Under my bridge
    7770 > 680gtx.
    1,500M transistors, 80 watts > 3,540M transistors, 195 watts.
    Is Durango going to have a 195 watt GPU? No, we all agree. Instead, MS have taken a 7770 and got 680gtx performance from it by adding 32 MBs SRAM. In some people's theory.

    Unified shader tech didn't come out of the blue. Everyone knew about it and was working towards it. How come no-one learnt that adding a 32 MB SRAM cache can triple overall performance of your GPU? Because it can't. ;)
     
  5. french toast

    Veteran

    Joined:
    Jan 5, 2012
    Messages:
    1,667
    Likes Received:
    9
    Location:
    Leicestershire - England
    I agree 3x jump seems wishfull thinking..but im just wondering whether it would not have been as useful in pc components due to it needing to be specially coded for
     
  6. Love_In_Rio

    Veteran

    Joined:
    Apr 21, 2004
    Messages:
    1,627
    Likes Received:
    226
    Shifty, maybe we start seing ESRAM in the desktop products with 20 nm process, when the area will be quite small. I imagine it will also depend on GPGPU being or not successful in the end or if homogeneous computing nullify it when Haswell with AVX2 and its derivatives start rolling out ( and by developers word this is a far better approach to computing that HSA and the likes ).
     
  7. Averagejoe

    Regular

    Joined:
    Jan 20, 2013
    Messages:
    328
    Likes Received:
    0

    Come on ESRAM will not increase performance on a 7770,the best you can hope for is allows i to reach close to its peak performance,the 7770 GPU will not give more than it peak period,i can some how believe the whole efficiency thing,but is hard to believe that Durango just because it has ESRAM will increase its performance to 680GTX levels.
     
  8. Ketto

    Newcomer

    Joined:
    Jul 30, 2012
    Messages:
    39
    Likes Received:
    0
    Location:
    Winter Park, Florida; and London UK.
    It really is a crazy notion when you think about it.
     
  9. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    44,106
    Likes Received:
    16,898
    Location:
    Under my bridge
    Well, to be fair, the argument is if a GPU is running 1/3 rd peak power at any point, so a 680gtx is only running 1/3 rd total potential and a 7770 is, and then adding SRAM enables 100% efficiency, that 7770 would achieve the same results as the 680gtx which is grossly inefficient.
     
  10. Ketto

    Newcomer

    Joined:
    Jul 30, 2012
    Messages:
    39
    Likes Received:
    0
    Location:
    Winter Park, Florida; and London UK.
    So if we add eSRAM to a 680GTX it'll perform at the level of a 880GTX? Why even bother with new architectures, release a non eSRAM version of your GPU, wait a while, release a version with eSRAM, rename the GPU and profit. :D

    To think MS discovered this before AMD or Nvidia. Blows my mind!
     
  11. Love_In_Rio

    Veteran

    Joined:
    Apr 21, 2004
    Messages:
    1,627
    Likes Received:
    226
    Well, Kepler is already very efficient, as you have seen Nvidia uses ESRAM low latency memory in its caches. So, make it even more efficient is more difficult. This takes me to another doubt, if gpu in these consoles will have also different type of cache memories from desktop parts.
     
  12. Brad Grenz

    Brad Grenz Philosopher & Poet
    Veteran

    Joined:
    Mar 3, 2005
    Messages:
    2,531
    Likes Received:
    2
    Location:
    Oregon
    Sure, but all those assumptions were being made by people with agendas trying to fabricate an advantage that doesn't really exist. The idea that the Durango GPU design was somehow more efficient than a conventional setup was always fallacious. AMD and nVidia can't both be catastrophically wrong about the amount of SRAM they choose to include as cache in their designs. Large amounts only make sense if your external bandwidth is constrained in some fashion...
     
  13. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    44,106
    Likes Received:
    16,898
    Location:
    Under my bridge
    I absolutely agree with that! Just playing devil's advocate as to how it could be possible, though realistically not in any way. It will be very interesting what compromises yield what benefits in Durango, as MS endeavour to get Durango to punch above its weight, but the hopes of high-end performance as are forlorn here as they are with Wii U. There is no magical hardware trick that'll multiply performance (otherwise everyone would be doing it!).
     
  14. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,661
    Likes Received:
    1,114
    It is a completely different design point. The lifetime of Orbis/Durango is going to be a little under a decade. The cost of embedding 32MB RAM is high now, but will reduce dramatically over time, whereas a wide bus interfacing old memory technology won't.

    What will happen to GDDR5 when die stacking becomes a reality?

    Cheers
     
  15. LightHeaven

    Regular

    Joined:
    Jul 29, 2005
    Messages:
    539
    Likes Received:
    20
    When you put that way it sounds ridiculous, but in reality it wouldn't be 1,5 bi transistors that through the powers of magic would perform akin to a 3,54bi setup. It would be similarly transistor count budgets (assuming their target was the performance of a 680gtx) designed in a way that could achieve the same ballpark performance but with less power consumption than straight up increasing computational power of the design.

    Of course they wouldn't perform the same as 680 at every single task, but if the modifications were done on solid data that showed the pitfalls that halts the gpu performance in a game much more than processing power, then yes, you could improve game performance while consuming less.

    To be fair, i'm not expecting it to be a match for a 680gtx, but it seems to me that they had a performance target and developed a system that could achieve that and remain inside their power envelope, instead of "okay, we need to be cheap, so let's put a weak sauce gpu in here and then do which trick we can to make the setup perform better".
     
  16. french toast

    Veteran

    Joined:
    Jan 5, 2012
    Messages:
    1,667
    Likes Received:
    9
    Location:
    Leicestershire - England
    Definitely microsoft has not sat around twiddling their thumbs and mearly implementing a slightly better version of edram..whilst equipping on 68gb/s of main system ram...there has to be something special about the sram..as its only 32mb in size instead of a far more usefull 64mb...it must be optimised for latency. ...whether thats a full fat 6t sram implementation, or some high end esram or something..
     
  17. Averagejoe

    Regular

    Joined:
    Jan 20, 2013
    Messages:
    328
    Likes Received:
    0
    Well from where this incredibly huge inefficiency of the 680GTX come from to begin with.?

    Is the 680GTX really that inefficient.?

    All this to me seems like wishful thinking really,i don't buy it i know efficiency can help but to think 32MB of ESRAM will be a magical performance booster is just silly.

    ESRAM has now transform into the new secret sauce.:sad:
     
  18. french toast

    Veteran

    Joined:
    Jan 5, 2012
    Messages:
    1,667
    Likes Received:
    9
    Location:
    Leicestershire - England
    Wouldnt the full benefit only be realised if coded for? How does that fit into pc gaming with api layers and game engines already built?

    I would have thought that design would be suited to a console..
     
  19. Love_In_Rio

    Veteran

    Joined:
    Apr 21, 2004
    Messages:
    1,627
    Likes Received:
    226
    Durango could have no cache misses ever, while 680 still have them when the searched data are not in the caches.
     
  20. patsu

    Legend

    Joined:
    Jun 25, 2005
    Messages:
    27,709
    Likes Received:
    145
    If Durango have no cache miss, then it should be possible to make sure 680 or Orbis or Xenos or RSX have no cache miss too. Basically they will all run at full speed, and the units with the most power will dominate under this scenario.
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...