Xbox One (Durango) Technical hardware investigation

3dilettante · Feb 14, 2013

XpiderMX said:
The "coherent 30GB/s Read and write" can be related to GPGPU?

It can be important for GPGPU, but it can also be handy for passing commands to the GPU for regular graphics loads with less overhead.

The link itself is nothing new to APUs, which have always had a full-width non-coherent bus for the GPU to use all possible external bandwidth, and a more narrow coherent bus.
It doesn't need to be as wide because we can see that the GPU would have no trouble overwhelming the cores with a fraction of its non-coherent bandwidth demands.

It can be seen that the coherent path is sufficient to allow the GPU to read from one CPU cluster and the media/HDD blocks at full speed.

Silenti said:
Perhaps this is a silly question. Quick preface: Something doesn't wash here. Just a gut feeling from myself. Hard to explain. I just don't quite buy the assumptions that are being passed around to fill in the missing parts/ lack of detail in the VGleaks docs. Specifically I just don't buy that it's a fairly straightforward mod of a 7xxx.

Why not? GCN is a vast improvement over the prior GPUs, and it's narrowed the gap in ISA features and programming model as compared to SIMD CPU instructions that the room for improvement is modest.
Half of the Orbis-only or Durango-only features the rumor mill has gushed about are actually standard GCN features both are likely to have.

Assuming that the 32mb of ESRAM is correct, how would anyone here change the GPU to take better advantage of that? MS seems to have spent a fair amount on that addition and the comments here seem to state it is not enough to make up for a lack main RAM bandwidth.

It's going to depend on the workload. The big consumer of bandwidth is obviously the GPU side, and the GPU is the unit that is most tightly paired to the ESRAM.
Other ancilliary benefits are that moving some of the most onerous rendering traffic to the ESRAM are that the Jaguar modules can benefit from a DDR3 memory pool that operates at a smaller granularity than GDDR5, and it seems pretty likely that the system memory controller won't have to struggle as much in juggling graphics and CPU access patterns.

liolio · Feb 14, 2013

ERP said:
But everything is a tradeoff.
Nothing can be considered in isolation, even seemingly fundamental things lie VLIW5->GCN if it was obviously a win to use vectors of scalers, why didn't they just do that?
The answer is always it's a combination of things, they may not have thought of it, the data they had on existing shader complexity in games may not have supported it, the additional scheduling logic or register pool resources may have been better spent on more VLIW5 ALU's etc etc etc.
It's something of a vicious cycle on PC where games optimize for the hardware and the hardware optimizes for the games.

As was pointed out earlier, NVidia does have lower latency cache and as a result seems to get more out of it's "flops".

May I ask you your point of view about AMD VLIW4 architecture?
I've come to the conclusion that it is a bit underrated as there were never a full line of products based on that architecture and as such it is no possible to do fair comparison between AMD last architectures.
I think that it cost AMD quiet some transistor to get GCN on its feet, I still wonder as MSFT (esram aside) as a fairly conservative transistor budget for its GPU if going with AMD VLIW4 architecture woudl have been a better choice.

The comparison is pretty much as such, 10 CUs GPU costs 1.5billions transistors, a 10 SIMDs vliw4 would cost 1 billions. Though GCN is likely to be a bit more dense (I think there is quiet some more memory cell in the design) but I don't think it would make up for the extra horse power going with AMD (mostly unused) previous architecture may have offer them.

For me the massive win in GCN seems to be foremost compute, for graphics VLIW4 was nice and really efficient, AMD included nice improvement to ROPs, texture units, tesselators etc. and for" free" as far as silicon/transistor cost is concerned.

I wonder about AMD choices (and how they translate perfs wise) for their next APUs, I' m not sure that the improvements in GCN (compute aside) are going to pay for themselves.
Looking at the density of the gpu on GF 32nm process I 'm close to think that GCN is not "economical". As far as games are concerned if they do the switch to GCN without increasing the die size I would not be surprised if perfs go down.

That VLIW4 architecture was imo pretty good, and AMD made strong push toward compute, Nvidia made the contrary... I think that a reworked VLIW4 could impress (games only) in perfs per mm^2.

My last on the matter will be that NIntendo should have gone with that architecture, it is possibly the best bang for the bucks they could have, with trinity we saw 4 SIMD gpu part (vliw4) beat the high end llano model (5 SIMD vliw5 design) the clock being ~ the same. There were quiet some wins in that design.

Shifty Geezer · Feb 14, 2013

bkilian said:
Nvidia _does_ add low latency SRAM to it's products, that article a couple of pages back showed that the only card with sub 20 cycle memory times in it's caches was the NVidia. AMD had a 300 cycle minimum, even in it's cache.

Sure. I'm not writing full posts because I'm mid-conversation. The point is, if a 7770 can be made to run like a 680gtx with far lower power consumption and far lower cost to make, wouldn't AMD + nVidia be aware of this? It's the order of magnitude increase some are hoping for that seems extremely implausible. Whatever Tesla nVidia are making now, they could drop a loads of CUs, add 32MBs of SRAM cache, and triple performance while reducing cost and power draw. If the difference in performance is that great, how come they aren't doing it?! They can't be that ignorant. Ergo, 32 MBs SRAM can't be all it takes to triple performance of a GPU computer architecture. Logic tells us that any improvement will come with a corresponding trade-off - there's no magic bullet. There's never a magic bullet. Every time a magic bullet solution is raised on these boards, it's always proven to be bunk. And turning a 7770 into a 680gtx by adding SRAM is such a mythical magic bullet super modification that makes no sense.

Shifty Geezer · Feb 14, 2013

dobwal said:
Its not cheap silicon. SRAM isn't inconsequential from a cost standpoint.

7770 > 680gtx.
1,500M transistors, 80 watts > 3,540M transistors, 195 watts.
Is Durango going to have a 195 watt GPU? No, we all agree. Instead, MS have taken a 7770 and got 680gtx performance from it by adding 32 MBs SRAM. In some people's theory.

There wasnt a unified shader part in the PC space for a full year after the release of the 360. So MS doing something nVidia hasn't done is not new.

Unified shader tech didn't come out of the blue. Everyone knew about it and was working towards it. How come no-one learnt that adding a 32 MB SRAM cache can triple overall performance of your GPU? Because it can't.

french toast · Feb 14, 2013

Shifty Geezer said:
Sure. I'm not writing full posts because I'm mid-conversation. The point is, if a 7770 can be made to run like a 680gtx with far lower power consumption and far lower cost to make, wouldn't AMD + nVidia be aware of this? It's the order of magnitude increase some are hoping for that seems extremely implausible. Whatever Tesla nVidia are making now, they could drop a loads of CUs, add 32MBs of SRAM cache, and triple performance while reducing cost and power draw. If the difference in performance is that great, how come they aren't doing it?! They can't be that ignorant. Ergo, 32 MBs SRAM can't be all it takes to triple performance of a GPU computer architecture. Logic tells us that any improvement will come with a corresponding trade-off - there's no magic bullet. There's never a magic bullet. Every time a magic bullet solution is raised on these boards, it's always proven to be bunk. And turning a 7770 into a 680gtx by adding SRAM is such a mythical magic bullet super modification that makes no sense.

I agree 3x jump seems wishfull thinking..but im just wondering whether it would not have been as useful in pc components due to it needing to be specially coded for

Love_In_Rio · Feb 14, 2013

Shifty Geezer said:
Sure. I'm not writing full posts because I'm mid-conversation. The point is, if a 7770 can be made to run like a 680gtx with far lower power consumption and far lower cost to make, wouldn't AMD + nVidia be aware of this? It's the order of magnitude increase some are hoping for that seems extremely implausible. Whatever Tesla nVidia are making now, they could drop a loads of CUs, add 32MBs of SRAM cache, and triple performance while reducing cost and power draw. If the difference in performance is that great, how come they aren't doing it?! They can't be that ignorant. Ergo, 32 MBs SRAM can't be all it takes to triple performance of a GPU computer architecture. Logic tells us that any improvement will come with a corresponding trade-off - there's no magic bullet. There's never a magic bullet. Every time a magic bullet solution is raised on these boards, it's always proven to be bunk. And turning a 7770 into a 680gtx by adding SRAM is such a mythical magic bullet super modification that makes no sense.

Shifty, maybe we start seing ESRAM in the desktop products with 20 nm process, when the area will be quite small. I imagine it will also depend on GPGPU being or not successful in the end or if homogeneous computing nullify it when Haswell with AVX2 and its derivatives start rolling out ( and by developers word this is a far better approach to computing that HSA and the likes ).

Averagejoe · Feb 14, 2013

Love_In_Rio said:
Real supercomputers have a lot of ESRAM and no many gpus,not sure the real money nvidia take from that but maybe not enough to compite with homogeneous computing beasts and to design specific chips for it.There is a post from sebbi that nails this question, let me look for it.
Here it is:
http://forum.beyond3d.com/showpost.php?p=1654196&postcount=32

Except for the 4 way SMT he almost is making a description of Durango.

Come on ESRAM will not increase performance on a 7770,the best you can hope for is allows i to reach close to its peak performance,the 7770 GPU will not give more than it peak period,i can some how believe the whole efficiency thing,but is hard to believe that Durango just because it has ESRAM will increase its performance to 680GTX levels.

Ketto · Feb 14, 2013

It really is a crazy notion when you think about it.

Shifty Geezer · Feb 14, 2013

Averagejoe said:
Come on ESRAM will not increase performance on a 7770,the best you can hope for is allows i to reach close to its peak performance,the 7770 GPU will not give more than it peak period...

Well, to be fair, the argument is if a GPU is running 1/3 rd peak power at any point, so a 680gtx is only running 1/3 rd total potential and a 7770 is, and then adding SRAM enables 100% efficiency, that 7770 would achieve the same results as the 680gtx which is grossly inefficient.

Ketto · Feb 14, 2013

So if we add eSRAM to a 680GTX it'll perform at the level of a 880GTX? Why even bother with new architectures, release a non eSRAM version of your GPU, wait a while, release a version with eSRAM, rename the GPU and profit.

To think MS discovered this before AMD or Nvidia. Blows my mind!

Love_In_Rio · Feb 14, 2013

Ketto said:
So if we add eSRAM to a 680GTX it'll perform at the level of a 880GTX? Why even bother with new architectures, release a non eSRAM version of your GPU, wait a while, release a version with eSRAM, rename the GPU and profit.

To think MS discovered this before AMD or Nvidia. Blows my mind!

Well, Kepler is already very efficient, as you have seen Nvidia uses ESRAM low latency memory in its caches. So, make it even more efficient is more difficult. This takes me to another doubt, if gpu in these consoles will have also different type of cache memories from desktop parts.

Brad Grenz · Feb 14, 2013

Shifty Geezer said:
Well, to be fair, the argument is if a GPU is running 1/3 rd peak power at any point, so a 680gtx is only running 1/3 rd total potential and a 7770 is, and then adding SRAM enables 100% efficiency, that 7770 would achieve the same results as the 680gtx which is grossly inefficient.

Sure, but all those assumptions were being made by people with agendas trying to fabricate an advantage that doesn't really exist. The idea that the Durango GPU design was somehow more efficient than a conventional setup was always fallacious. AMD and nVidia can't both be catastrophically wrong about the amount of SRAM they choose to include as cache in their designs. Large amounts only make sense if your external bandwidth is constrained in some fashion...

Shifty Geezer · Feb 14, 2013

Brad Grenz said:
Sure, but all those assumptions were being made by people with agendas trying to fabricate an advantage that doesn't really exist. The idea that the Durango GPU design was somehow more efficient than a conventional setup was always fallacious. AMD and nVidia can't both be catastrophically wrong about the amount of SRAM they choose to include as cache in their designs. Large amounts only make sense if your external bandwidth is constrained in some fashion...

I absolutely agree with that! Just playing devil's advocate as to how it could be possible, though realistically not in any way. It will be very interesting what compromises yield what benefits in Durango, as MS endeavour to get Durango to punch above its weight, but the hopes of high-end performance as are forlorn here as they are with Wii U. There is no magical hardware trick that'll multiply performance (otherwise everyone would be doing it!).

Gubbi · Feb 14, 2013

Shifty Geezer said:
I absolutely agree with that! Just playing devil's advocate as to how it could be possible, though realistically not in any way. It will be very interesting what compromises yield what benefits in Durango, as MS endeavour to get Durango to punch above its weight, but the hopes of high-end performance as are forlorn here as they are with Wii U. There is no magical hardware trick that'll multiply performance (otherwise everyone would be doing it!).

It is a completely different design point. The lifetime of Orbis/Durango is going to be a little under a decade. The cost of embedding 32MB RAM is high now, but will reduce dramatically over time, whereas a wide bus interfacing old memory technology won't.

What will happen to GDDR5 when die stacking becomes a reality?

Cheers

LightHeaven · Feb 14, 2013

Shifty Geezer said:
7770 > 680gtx.
1,500M transistors, 80 watts > 3,540M transistors, 195 watts.
Is Durango going to have a 195 watt GPU? No, we all agree. Instead, MS have taken a 7770 and got 680gtx performance from it by adding 32 MBs SRAM. In some people's theory.

Unified shader tech didn't come out of the blue. Everyone knew about it and was working towards it. How come no-one learnt that adding a 32 MB SRAM cache can triple overall performance of your GPU? Because it can't.

When you put that way it sounds ridiculous, but in reality it wouldn't be 1,5 bi transistors that through the powers of magic would perform akin to a 3,54bi setup. It would be similarly transistor count budgets (assuming their target was the performance of a 680gtx) designed in a way that could achieve the same ballpark performance but with less power consumption than straight up increasing computational power of the design.

Of course they wouldn't perform the same as 680 at every single task, but if the modifications were done on solid data that showed the pitfalls that halts the gpu performance in a game much more than processing power, then yes, you could improve game performance while consuming less.

To be fair, i'm not expecting it to be a match for a 680gtx, but it seems to me that they had a performance target and developed a system that could achieve that and remain inside their power envelope, instead of "okay, we need to be cheap, so let's put a weak sauce gpu in here and then do which trick we can to make the setup perform better".

french toast · Feb 14, 2013

LightHeaven said:
When you put that way it sounds ridiculous, but in reality it wouldn't be 1,5 bi transistors that through the powers of magic would perform akin to a 3,54bi setup. It would be similarly transistor count budgets (assuming their target was the performance of a 680gtx) designed in a way that could achieve the same ballpark performance but with less power consumption than straight up increasing computational power of the design.

Of course they wouldn't perform the same as 680 at every single task, but if the modifications were done on solid data that showed the pitfalls that halts the gpu performance in a game much more than processing power, then yes, you could improve game performance while consuming less.

To be fair, i'm not expecting it to be a match for a 680gtx, but it seems to me that they had a performance target and developed a system that could achieve that and remain inside their power envelope, instead of "okay, we need to be cheap, so let's put a weak sauce gpu in here and then do which trick we can to make the setup perform better".

Definitely microsoft has not sat around twiddling their thumbs and mearly implementing a slightly better version of edram..whilst equipping on 68gb/s of main system ram...there has to be something special about the sram..as its only 32mb in size instead of a far more usefull 64mb...it must be optimised for latency. ...whether thats a full fat 6t sram implementation, or some high end esram or something..

Averagejoe · Feb 14, 2013

Shifty Geezer said:
Well, to be fair, the argument is if a GPU is running 1/3 rd peak power at any point, so a 680gtx is only running 1/3 rd total potential and a 7770 is, and then adding SRAM enables 100% efficiency, that 7770 would achieve the same results as the 680gtx which is grossly inefficient.

Well from where this incredibly huge inefficiency of the 680GTX come from to begin with.?

Is the 680GTX really that inefficient.?

All this to me seems like wishful thinking really,i don't buy it i know efficiency can help but to think 32MB of ESRAM will be a magical performance booster is just silly.

ESRAM has now transform into the new secret sauce.

french toast · Feb 14, 2013

Ketto said:
So if we add eSRAM to a 680GTX it'll perform at the level of a 880GTX? Why even bother with new architectures, release a non eSRAM version of your GPU, wait a while, release a version with eSRAM, rename the GPU and profit.

To think MS discovered this before AMD or Nvidia. Blows my mind!

Wouldnt the full benefit only be realised if coded for? How does that fit into pc gaming with api layers and game engines already built?

I would have thought that design would be suited to a console..

Love_In_Rio · Feb 14, 2013

Averagejoe said:
Well from where this incredibly huge inefficiency of the 680GTX come from to begin with.?

Is the 680GTX really that inefficient.?

All this to me seems like wishful thinking really,i don't buy it i know efficiency can help but to think 32MB of ESRAM will be a magical performance booster is just silly.

ESRAM has now transform into the new secret sauce.

Durango could have no cache misses ever, while 680 still have them when the searched data are not in the caches.

patsu · Feb 14, 2013

If Durango have no cache miss, then it should be possible to make sure 680 or Orbis or Xenos or RSX have no cache miss too. Basically they will all run at full speed, and the units with the most power will dominate under this scenario.

Xbox One (Durango) Technical hardware investigation

3dilettante

liolio

Aquoiboniste

Shifty Geezer

uber-Troll!

Shifty Geezer

uber-Troll!

french toast

Love_In_Rio

Averagejoe

Ketto

Shifty Geezer

uber-Troll!

Ketto

Love_In_Rio

Brad Grenz

Philosopher & Poet

Shifty Geezer

uber-Troll!

Gubbi

LightHeaven

french toast

Averagejoe

french toast

Love_In_Rio

patsu

Similar threads