Xbox One (Durango) Technical hardware investigation

Status
Not open for further replies.
Sure. I'm not writing full posts because I'm mid-conversation. The point is, if a 7770 can be made to run like a 680gtx with far lower power consumption and far lower cost to make, wouldn't AMD + nVidia be aware of this? It's the order of magnitude increase some are hoping for that seems extremely implausible. Whatever Tesla nVidia are making now, they could drop a loads of CUs, add 32MBs of SRAM cache, and triple performance while reducing cost and power draw. If the difference in performance is that great, how come they aren't doing it?! They can't be that ignorant. Ergo, 32 MBs SRAM can't be all it takes to triple performance of a GPU computer architecture. Logic tells us that any improvement will come with a corresponding trade-off - there's no magic bullet. There's never a magic bullet. Every time a magic bullet solution is raised on these boards, it's always proven to be bunk. And turning a 7770 into a 680gtx by adding SRAM is such a mythical magic bullet super modification that makes no sense.

For a cost comparison you can't just look at a 7770 and point to it say that because it doesn't have it, then it doesn't man financial sense to include eSRAM to boost real world performance.

It's pretty obvious that eSRAM was included in order to facilitate the use of a large memory pool without the high cost associated with a large memory pool using high bandwidth GDDR5.

1 GB of GDDR5 isn't going to break the bank on a midrange/budget graphics card. 8 GB of GDDR5 would price it out of market. Even 8 GB of DDR3 + 32 MB of eSRAM would likely price it out of the market, but it'd likely be cheaper to manufacture.

So, for Durango, there are cost savings that make sense in regards to the target market and its targeted product design. For the 7770, there are no use cases that make sense. For example, 256 bit DDR3 would give almost the same bandwidth (68 GB/s) as the 128 bit of GDDR5 (72 GB/s) on the 7770. Well, except that I believe the 7770 is pad limited so a 256 bit bus is likely not possible. And a large enough pool of eSRAM while it may allow for better use of the GPU with GDDR5, is obvious a non-starter with regards to cost for that market segment.\

That above bit was in reference to the notion that eSRAM isn't a cost saver while maintaining or potentially improving performance that came up in multiple posts here.

People hoping it'll turn into a GTX 680 are being led astry by wishful thinking or more relevantly by people attempting to dismiss any contributions eSRAM (or anything else) could have. The whole making a 7770 perform like a GTX 680 was brought up by someone who doesn't think anything can boost the performance of the GPU in Durango over and above a similar PC based graphics card. People thinking that MS might have put things in to boost the efficiency of the GPU in certain use cases have never thought it would attain the levels of a GTX 680 much less a 7870.

But that said, on average it'll have the potential to close the gap between it and a more powerful GPU with a more standard design. At worst it'll be similar to a 7770, but there will be cases that it will be better, and in some cases may be significantly better. That's about all that can be inferred at the moment from what we have to work with.

And there are certainly area's where the Orbis GPU will be far less efficient. The 32 ROPs that the 7850/7870 contain are already bandwidth starved, for example, and will rarely ever achieve full utilization. Orbis won't be able to make them much more efficient for what they were designed to do. So while 32 looks a whole lot bigger than 16, in the real world, that advantage is far lower.

Hence, where claims of higher efficiency getting closer to the real world performance of a much higher spec'd GPU comes in. The history of PC GPU is littered with cases of one IHV at one point making more efficient use of their GPU resources despite not having as many of those same resources and hence having either higher performance or performance much closer than would otherwise be the case.

Again, before the ravenous X console is obviously better than Y console based on this or that from uncomfirmed rumored leaks of information start latching onto this. From what has been revealed, I have never and still don't think the Durango is going ot be faster in 3D rendering than Orbis. The only question in my mind is how close real world rendering performance will be.

Regards,
SB
 
durango might actually end up future proofed from the get go for next gen game engines...and perform on par with gtx 680 in dx11 type games...taking into consideration some form of hsa, and low latency esram...all coded for bespoke with much lower api over head.

No of course not. We've already had two of the most respected developers on these forums tell us the esrams benefits to gaming will be somewhere between nothing and minor.

This is because (as has been said many times already) modern GPU's are already very good at hiding latency. That applies to both GCN and Kepler. These GPU's have low latency caches to assist with memory accesses and it's these caches amoungst other things which allow the GPU's to stay fed with data. If those caches were so horribly ineffective that throwing what is effectively an additional 32MB cache on there would improve performance 3x over don't you think it would have been done a long time ago?

We're not talking about something that needs developer support to utilise here. Were just talking about modifying the cache architecture of the GPU to either increase the size of the exisitng caches or add a much larger extra level of cache on there to give a 3 fold improvement.

That's clearly a ludicrous idea, NV and AMD have access to the same performance analysis tools as do Microsoft. They can see how data flows through the memory architecture in modern games as well as anyone can. There's no way at all that their systems are so inefficient as to allow the addition of an extra 32MB of cache to increase performance by 3x.

As for the whole DX9 vs DX11 shaders talk. Shaders in modern PC games /= console level shaders. Shaders are often re-written completely for the PC and can be doing a considerably greater amount of work than their console counterparts. And it's these shaders at their highest settings which modern GPU's are designed to handle. No-one designs a high end GPU to run games at low.

The suggestion by some appears to be that next generation consoles may be using engines which render the "old" cache subsystem of modern GPU's obsolete for hiding latency in future workloads and thus Durnago will suddenly perform 3x faster thanks to esram. If you want proof of how foolish this argument is look no further than the lack of esram in Orbis. If 32MB esram is a magic bullet that makes you GPU 3x more effiecient in "future workloads" - be that GPGPU or graphics workloads - with the only barrier to enabling that improvement being the ability to code your game specifically to that esram ala console development, then why on earth didn't they include it? AMD designed Orbis's GPU just as it designed Durango's. Are we to believe that AMD simply didn't realise this and it took the heavyweight GPU experience of Microsoft to show them? Or perhaps AMD deliberately hobbled Sony by not revealing to them the awsome 3x speed up a simple pool of esram will give to GCN's performance in future workloads?
 
That's all true, but it is a meaningless comparison if you don't know how it impacts the final performance (aka framerate).

Let me exaggerate just to make a point: Imagine that a 680 spent 85% of the frame time idle due memory latency and had only 10% for ALU work, and the remaining 5% would be anything else. In that case, while it's true that durango's gpu would never match the flop performance/output of a 680, it turns out that ALU is simply not a huge part of the frame time, so you can actually have a gpu with less ALUs but faster memory that outperforms a 680 in that frame.

Of course, in reality it's not a drastic scenario like this, but Ms has data that shows for all current games where they spent most of their time, and instead of brute forcing everything to improve performance, they seem to have designed a system that tackles those bottlenecks. It's not that far fetch to assume that for running those games that setup could perform better.

But that's not guarantee that they will always perform on par either, because games from 5-6 years from now can be drastically different from current ones and be more suited to another architecture...

It kinda happened this gen when developers started using deferred renders, which nullified the multisample advantage that xenos had in the beginning of the generation, and accentuated its shortcomings.


http://www.tomshardware.com/charts/...rectX-11-C-Extreme,Marque_fbrandx13,2969.html


Battle Field 3 Ultra 2560x144.

99 FPS -680GTX.

http://www.tomshardware.com/charts/...rectX-11-C-Extreme,Marque_fbrandx32,2969.html

Battle Field 3 Ultra 2560x144.

10 FPS 7770.

Even the more powerful 7870 doesn't hit 30FPS.

I don't see 32Mb of ESRAM closing what is basically a 9X or 10X gap.
 
No of course not. We've already had two of the most respected developers on these forums tell us the esrams benefits to gaming will be somewhere between nothing and minor.

This is because (as has been said many times already) modern GPU's are already very good at hiding latency. That applies to both GCN and Kepler. These GPU's have low latency caches to assist with memory accesses and it's these caches amoungst other things which allow the GPU's to stay fed with data. If those caches were so horribly ineffective that throwing what is effectively an additional 32MB cache on there would improve performance 3x over don't you think it would have been done a long time ago?

We're not talking about something that needs developer support to utilise here. Were just talking about modifying the cache architecture of the GPU to either increase the size of the exisitng caches or add a much larger extra level of cache on there to give a 3 fold improvement.

That's clearly a ludicrous idea, NV and AMD have access to the same performance analysis tools as do Microsoft. They can see how data flows through the memory architecture in modern games as well as anyone can. There's no way at all that their systems are so inefficient as to allow the addition of an extra 32MB of cache to increase performance by 3x.

As for the whole DX9 vs DX11 shaders talk. Shaders in modern PC games /= console level shaders. Shaders are often re-written completely for the PC and can be doing a considerably greater amount of work than their console counterparts. And it's these shaders at their highest settings which modern GPU's are designed to handle. No-one designs a high end GPU to run games at low.

The suggestion by some appears to be that next generation consoles may be using engines which render the "old" cache subsystem of modern GPU's obsolete for hiding latency in future workloads and thus Durnago will suddenly perform 3x faster thanks to esram. If you want proof of how foolish this argument is look no further than the lack of esram in Orbis. If 32MB esram is a magic bullet that makes you GPU 3x more effiecient in "future workloads" - be that GPGPU or graphics workloads - with the only barrier to enabling that improvement being the ability to code your game specifically to that esram ala console development, then why on earth didn't they include it? AMD designed Orbis's GPU just as it designed Durango's. Are we to believe that AMD simply didn't realise this and it took the heavyweight GPU experience of Microsoft to show them? Or perhaps AMD deliberately hobbled Sony by not revealing to them the awsome 3x speed up a simple pool of esram will give to GCN's performance in future workloads?

Yea but I do think your missing the other points of what I was saying...there is obviously api overhead reduction when coding for consoles, the benefit of not having to code for the lowest common denominator (as pc gaming) the hsa part of it will improve efficiency, the latency savings from not having to go to main ram, the dma engines, and also the displau planes. ..

Each part of that is nothing revolutionary, but it more than likely will add up to more than the sum of its parts.

If the main argument is
.will hd 7770 with esrsm pawn a gtx 680 on pcs..then obviously there is not a cat in hells chance.

However if it is durango inside a console..with proper console coded/optimised games....against a pc gaming rig 680gtx. ..then I could see something quite close.
Just my opinion.
 
I said this in another thread ( i think) but when you are comparing whats in Durango to what is in a whole line of consumer GPUs, doesn't the solution have to be scaleable to all price points? Sure you could build a $600 680GTX with ESRAM but it would be a one off solution in a whole family of graphics that need to hit prices as low as $150. They cant build a high end card, swap out a slower/cheaper memory bus, and chop off CUs to make cheaper versions if relatively high-cost ESRAM is at the heart of the design right?

EDIT: I'm not making performance claims one way or the other, just trying to answer the "why its in Durango and not in Kepler" question.

But this isn't necessarily about needing specifically a 32MB pool of esram. The suggestion that any low latency memory pool could so vastly increase rendering performance suggests that the existing caches in these GPU's are either too small or too high latency to be effective. So simply increasing cache size and/or reducing cache latency would potentially have a massive performance impact and that's something that could be done across a large part of the GPU range (assuming you could cut out other resources to compensate for the increased die size while still retaining equal or greater performance).
 
http://www.tomshardware.com/charts/...rectX-11-C-Extreme,Marque_fbrandx13,2969.html


Battle Field 3 Ultra 2560x144.

99 FPS -680GTX.

http://www.tomshardware.com/charts/...rectX-11-C-Extreme,Marque_fbrandx32,2969.html

Battle Field 3 Ultra 2560x144.

10 FPS 7770.

Even the more powerful 7870 doesn't hit 30FPS.

I don't see 32Mb of ESRAM closing what is basically a 9X or 10X gap.

Now show me the test at 1920x1080 with the system having 8 Jaguar cores as the cpu. And maybe you'll be getting close to something useful.
 
And there are certainly area's where the Orbis GPU will be far less efficient. The 32 ROPs that the 7850/7870 contain are already bandwidth starved, for example, and will rarely ever achieve full utilization. Orbis won't be able to make them much more efficient for what they were designed to do. So while 32 looks a whole lot bigger than 16, in the real world, that advantage is far lower.

Hence, where claims of higher efficiency getting closer to the real world performance of a much higher spec'd GPU comes in. The history of PC GPU is littered with cases of one IHV at one point making more efficient use of their GPU resources despite not having as many of those same resources and hence having either higher performance or performance much closer than would otherwise be the case.


Regards,
SB

1 problem with your argument.

Nor the 7850 or the 7870 have 4 CU apart,maybe Sony knew the 18 CU would be starved with those 32 ROP,so they let 14 and put 4 apart for that same reason,you can use the other 4CU instead of waiting them,it would also explain the rumors of minimal impact if the separate CU are use for rendering.

Is funny how Durango will be super efficient but Orbis will not be able to achieve the same...
 
Now show me the test at 1920x1080 with the system having 8 Jaguar cores as the cpu. And maybe you'll be getting close to something useful.



http://www.tomshardware.com/charts/...X-11-B-Performance,Marque_fbrandx13,2968.html

1080p

680GTX 143 FPS.

http://www.tomshardware.com/charts/...X-11-B-Performance,Marque_fbrandx32,2968.html

1080p

7770 35FPS...

Now i don't know why you want and 8 core jaguar,i am sure the CPU on this test setup from Intel was far more powerful and capable than an 8 core jaguar.
 
1 problem with your argument.

Nor the 7850 or the 7870 have 4 CU apart,maybe Sony knew the 18 CU would be starved with those 32 ROP,so they let 14 and put 4 apart for that same reason,you can use the other 4CU instead of waiting them,it would also explain the rumors of minimal impact if the separate CU are use for rendering.

Is funny how Durango will be super efficient but Orbis will not be able to achieve the same...

It'd be nice if you actually knew what you were talking about.

Let's put this in a way that you can hopefully understand it. The 7970 has 32 ROPs and 32 CUs as well as 264 GB/s bandwidth.

Note the similar number of ROPs? 16 ROPs may have been too little for the 78xx line. But 32 was definitely overkill. There isn't a smaller granularity than that, hence the 32 ROPs on 78xx. And hence lower efficiency (utilization of those resources) is better than higher utilization of fewer resources which have the potential to be a significant bottleneck.

In the same way look at the 7870 compared to the 7850. They have the same amount of bandwidth (153.6 GB/s). That is likely spec'd with regards to the 7870. Meaning that the 7850 likely isn't making the most efficient use of that bandwidth in most cases. Hence pure bandwidth numbers don't reflect pure performance advantages or disadvantages.

18 CUs are certainly not going to be starved by 32 ROPs.

/me tries hard not to use the rolley eyes smiley. :p

At this point I'm not sure why I even bother replying to you anymore.

Regards,
SB
 
Well that performance difference shrank from 10x to 4x. I wonder what the difference would be at 720p? or some in-between resolution like 1440x1080?
 
It'd be nice if you actually knew what you were talking about.

Let's put this in a way that you can hopefully understand it. The 7970 has 32 ROPs and 32 CUs as well as 264 GB/s bandwidth.

Note the similar number of ROPs? 16 ROPs may have been too little for the 78xx line. But 32 was definitely overkill. There isn't a smaller granularity than that, hence the 32 ROPs on 78xx. And hence lower efficiency (utilization of those resources) is better than higher utilization of fewer resources which have the potential to be a significant bottleneck.

18 CUs are certainly not going to be starved by 32 ROPs.

/me tries hard not to use the rolley eyes smiley. :p

At this point I'm not sure why I even bother replying to you anymore.

Regards,
SB

Cheers you know more than me.:smile:


Mind you i am not the smartest guy about this,but i do know that efficiency will help you until certain point,there is nothing more that could be done after that.

This is what i get from MS 32MB of ESRAM, they cheapen out with DDR3,because they are targeting casuals with Kinect,graphics is not the primal focus this time.

Is funny you say i don't know anything but even ERP make a comment about been more worry about Durango low ROP's count,others as well has express concern about it,maybe they are not as optimistic as you are about it,or share your opinion.
 
Durango like any other GPU have its limits,ESRAM at best case scenario can help the 7770 achieve its peak,not go over that peak,no matter what some people try to paint this the 7770 peak is far far away from the 680GTX peak.

His argument is that 7770 peak > 680 "typical usage". He isn't suggesting it will perform beyond its own peak, nor is he comparing anything to 680's theoretical peak. I've hear insiders on GAF suggest efficiency will be a pretty important thing, but thus far I've heard nothing remotely quantitative about how inefficient modern GPU's like 680 for instance actually are in real world usage.

There is also the factor of design goals and implementation to consider. By which I mean, if Durango has a significant advantage over other modern setups in leveraging virtualized tech (significantly more efficient with both processing power and bandwidth) then it could be a pretty big deal if devs utilize that approach. Hell, maybe MS's dev tools are being built to virtualize the assets for devs somehow.
 
His argument is that 7770 peak > 680 "typical usage". He isn't suggesting it will perform beyond its own peak, nor is he comparing anything to 680's theoretical peak. I've hear insiders on GAF suggest efficiency will be a pretty important thing, but thus far I've heard nothing remotely quantitative about how inefficient modern GPU's like 680 for instance actually are in real world usage.

There is also the factor of design goals and implementation to consider. By which I mean, if Durango has a significant advantage over other modern setups in leveraging virtualized tech (significantly more efficient with both processing power and bandwidth) then it could be a pretty big deal if devs utilize that approach. Hell, maybe MS's dev tools are being built to virtualize the assets for devs somehow.

That is the problem what is a 680GTX so call typical use.?

How can any one measure this even more on hardware they basically know nothing about it.?

MS can claim 1080p wit 8XAA now reaching that is another thing,i learn that with the xbox 360 which was say to have all its games at 720p minimum with 4XAA,that was the estimate it fell short,sony as well fell short with their claims of 1080p 60FPS.
 
Cheers you know more than me.:smile:


Mind you i am not the smartest guy about this,but i do know that efficiency will help you until certain point,there is nothing more that could be done after that.

This is what i get from MS 32MB of ESRAM, they cheapen out with DDR3,because they are targeting casuals with Kinect,graphics is not the primal focus this time.

Is funny you say i don't know anything but even ERP make a comment about been more worry about Durango low ROP's count,others as well has express concern about it,maybe they are not as optimistic as you are about it,or share your opinion.

I'm not "optimistic." Perhaps you keep missing my repeated statements that Durango is unlikely to be able to do 3D rendering as fast as Orbis?

I don't care which console is better at rendering 3D in anything but an academic way. It's interesting. The designs are interesting. Speculating about it is interesting.

I'd be just as happy if the specs we were talking about is labeled console X and console Y and no-one had any idea which company was behind them. Because, I do not care who "wins" the spec war. I like Sony just as much as I like Microsoft. I have no vested interest in Sony or Microsoft or Playstation or Xbox. I have a Sony receiver, 2x Sony Cameras, and a Sony stereo system in my car. :p I also have a PS1 and PS2. I also have an Xbox and X360. I also have a Wii. They are appliances. They are nothing to become fanatical about.

Speculating about the leaked hardware is interesting. Putting blinders on (too much invested into one console or the other) to limit how you view the leaked specifications doesn't serve any purpose.

I'd speculate just as much about Orbis, except there hasn't been much that's mysterious that's been revealed. It's pretty standard with regards to PC computing, hence, not much to speculate on as what has been revealed is fairly well understood. That doesn't mean there isn't something special there that would be interesting to speculate about, only that nothing interesting as been revealed.

Durango on the other hand has bits and bobs that are interesting because they aren't well understood, especially with regards to how they affect performance.

Regards,
SB
 
No of course not. We've already had two of the most respected developers on these forums tell us the esrams benefits to gaming will be somewhere between nothing and minor.

This is because (as has been said many times already) modern GPU's are already very good at hiding latency. That applies to both GCN and Kepler. These GPU's have low latency caches to assist with memory accesses and it's these caches amoungst other things which allow the GPU's to stay fed with data. If those caches were so horribly ineffective that throwing what is effectively an additional 32MB cache on there would improve performance 3x over don't you think it would have been done a long time ago?

We're not talking about something that needs developer support to utilise here. Were just talking about modifying the cache architecture of the GPU to either increase the size of the exisitng caches or add a much larger extra level of cache on there to give a 3 fold improvement.

That's clearly a ludicrous idea, NV and AMD have access to the same performance analysis tools as do Microsoft. They can see how data flows through the memory architecture in modern games as well as anyone can. There's no way at all that their systems are so inefficient as to allow the addition of an extra 32MB of cache to increase performance by 3x.

As for the whole DX9 vs DX11 shaders talk. Shaders in modern PC games /= console level shaders. Shaders are often re-written completely for the PC and can be doing a considerably greater amount of work than their console counterparts. And it's these shaders at their highest settings which modern GPU's are designed to handle. No-one designs a high end GPU to run games at low.

The suggestion by some appears to be that next generation consoles may be using engines which render the "old" cache subsystem of modern GPU's obsolete for hiding latency in future workloads and thus Durnago will suddenly perform 3x faster thanks to esram. If you want proof of how foolish this argument is look no further than the lack of esram in Orbis. If 32MB esram is a magic bullet that makes you GPU 3x more effiecient in "future workloads" - be that GPGPU or graphics workloads - with the only barrier to enabling that improvement being the ability to code your game specifically to that esram ala console development, then why on earth didn't they include it? AMD designed Orbis's GPU just as it designed Durango's. Are we to believe that AMD simply didn't realise this and it took the heavyweight GPU experience of Microsoft to show them? Or perhaps AMD deliberately hobbled Sony by not revealing to them the awsome 3x speed up a simple pool of esram will give to GCN's performance in future workloads?

Wait, are you saying that the inclusion of the esram is useless? I can't recall any dev ever saying that in here. Actually I can recall several devs speculating on why it was included and the potential use case for it. Actually I think you should read the post by bkilian. While I won't personally begin to speculate on the use of esram, it seems like it will be a win in terms of reducing latency. Anyway, all I will say is, for the past 3 to 4 generations we have always had a console or two with a form of embedded RAM, the 360 gpu was designed in a way that a significant silicon budget was spent on the paltry 10mb edram. I have to believe that there is a reason it keeps popping up in consoles and not on pc gpu. Consoles are designed to last, and as such there has to be careful considerations on BOM and the components to get you the best bang for your buck, and that means doing as much as you can with as little as possible. We can all speculate why its embedded memory is not used on general purpose pcs nut hte answer is fairly obvious, at least from a high level point of view; its simply because they are general purpose, while consoles are specialist systems. This implies a different point of view when designing each one. From what I can gather from comment by bkilian, sebbbi and co. embedded RAM definitely has its benefits, especially when dealing with specialist systems like consoles, supercomputers etc. So I think saying it has no benefit to rendering and or throughput is really wrong, especially given the evidence that suggest otherwise.
 
But this isn't necessarily about needing specifically a 32MB pool of esram. The suggestion that any low latency memory pool could so vastly increase rendering performance suggests that the existing caches in these GPU's are either too small or too high latency to be effective. So simply increasing cache size and/or reducing cache latency would potentially have a massive performance impact and that's something that could be done across a large part of the GPU range (assuming you could cut out other resources to compensate for the increased die size while still retaining equal or greater performance).

My point is that its relatively expensive regardless if its 32MB in a $600 card or 8MB in a $150 card, its % of the BOM is still "high." Also, as bkillian suggests (i think this is what he's saying) , introducing such a drastic change to the video card market that doesn't perform well in all scenarios is very risky from a business standpoint as neither nvidia nor AMD control the API (and therefore the way the hardware is utilized) the way MS does with Durango.
 
Last edited by a moderator:
Status
Not open for further replies.
Back
Top