Xbox One (Durango) Technical hardware investigation

Love_In_Rio · Feb 14, 2013

Ketto said:
So if we add eSRAM to a 680GTX it'll perform at the level of a 880GTX? Why even bother with new architectures, release a non eSRAM version of your GPU, wait a while, release a version with eSRAM, rename the GPU and profit.

To think MS discovered this before AMD or Nvidia. Blows my mind!

If you add ESRAM to a 660GTX it will not run an unreal engine 3 better than a 680GTX because the shaders are not as data access heavy as directx 11 ones, and in this case the more number of ALUS of 680 would rule.
If 780GTX is a 680 with 32mb of ESRAM you still wouldn´t run unreal engine 3 games better and so people would see that in the reviews and say... what a shit. But what about unreal engine 4, suppossely a directx 11 only engine with GPGPU effects, like lighting, particle physics and so on?. Then...yes, improvements all around for sure.

Shifty Geezer · Feb 14, 2013

LightHeaven said:
When you put that way it sounds ridiculous, but in reality it wouldn't be 1,5 bi transistors that through the powers of magic would perform akin to a 3,54bi setup. It would be similarly transistor count budgets (assuming their target was the performance of a 680gtx) designed in a way that could achieve the same ballpark performance but with less power consumption than straight up increasing computational power of the design.

That's a possibility, (32 MBs at 6T per bit is ~1.5 B transistors), although I contend that nVidia would go with lower power for the same performance if it was an option, certainly in the supercomputing space where power consumption is a massive decision making factor to minimise expensive running costs. Have we any confirmation that it's 6T SRAM though? I'm getting lost on what current knowledge is!

Love_In_Rio · Feb 14, 2013

patsu said:
If Durango have no cache miss, then it should be possible to make sure 680 or Orbis or Xenos or RSX have no cache miss too. Basically they will all run at full speed, and the units with the most power will dominate under this scenario.

Well, i correct myself. Latency misses instead of caches misses, as when you have a cache miss you go to the GDDR5 for data.

Love_In_Rio · Feb 14, 2013

Shifty Geezer said:
That's a possibility, (32 MBs at 6T per bit is ~1.5 B transistors), although I contend that nVidia would go with lower power for the same performance if it was an option, certainly in the supercomputing space where power consumption is a massive decision making factor to minimise expensive running costs. Have we any confirmation that it's 6T SRAM though? I'm getting lost on what current knowledge is!

Shifty, all my conjetures are based on it being 6T SRAM ( for the speed and TDP ). And based too on the boreness that would be a next gen with no bizarre architectures

.
If not 6T SRAM erase all my last posts

Lucid_Dreamer · Feb 14, 2013

LightHeaven said:
To be fair, i'm not expecting it to be a match for a 680gtx, but it seems to me that they had a performance target and developed a system that could achieve that and remain inside their power envelope, instead of "okay, we need to be cheap, so let's put a weak sauce gpu in here and then do which trick we can to make the setup perform better".

What's the difference, if the endpoint is the same result?

Xovek · Feb 14, 2013

french toast said:
Definitely microsoft has not sat around twiddling their thumbs and mearly implementing a slightly better version of edram..whilst equipping on 68gb/s of main system ram...there has to be something special about the sram..as its only 32mb in size instead of a far more usefull 64mb...it must be optimised for latency. ...whether thats a full fat 6t sram implementation, or some high end esram or something..

Excuse my irruption in this chat, but I´ve been wondering if this eDRAM 3D 32 nm from IBM could be the "ESRAM" showed by VGL, according this note:

http://semiaccurate.com/2013/02/07/ibm-adds-5-dimensions-to-chip-stacks-in-one-year/#.URztjeGFBdg

This eDRAM 3D was mentioned in this other press note from IBM in early 2012:

http://www-03.ibm.com/press/us/en/pressrelease/36465.wss

And casually this eDRAM 3D has beed rumored in the manufacture of the next xbox by IBM and GlogalFoundry:

http://www.xbitlabs.com/news/multim..._Produce_Chips_for_Next_Gen_Xbox_Rumours.html

I apologize myself for this little deviation of this topic, but your debate is very interesting so I couln´t resist put this possibility in the table

patsu · Feb 14, 2013

Love_In_Rio said:
Well, i correct myself. Latency misses instead of caches misses, as when you have a cache miss you go to the GDDR5 for data.

If the GPU keeps switching tasks or your data set is small, then you may need to fetch new ones from the RAM and incur latency cost for fetching new chunks of data. If the GPU runs a predictable set of algorithms, it should be possible to hide most of the latencies. In which case, the bandwidth would be more important to sustain your compute nodes.

The developers will try to optimize for the architectures. e.g., Cell LocalStore has 6 cycle latency compared to tens and hundreds elsewhere. It completes MLAA in 5ms over 5 SPUs @ 3.2 GHz. A powerful PC GPU completes a variant of MLAA (less accurate) in 0.1ms despite the longer latency to access its cache and RAM. The devs were able to hide the GPU's latency, and churn through MLAA using all its nodes quickly.

To make sure the CUs are used efficiently for both GPGPU and normal graphics work, AMD advices separating the work. This is probably why Orbis splits its CUs into 14+4 combo. It will help the GPU to be more efficient overall.

On Durango, the shorter latency to hit ESRAM will give it more efficiency when switching work, assuming it doesn't split its CUs up. It's another way to deal with the same problem.

Averagejoe · Feb 14, 2013

Love_In_Rio said:
Durango could have no cache misses ever, while 680 still have them when the searched data are not in the caches.

I like best cases scenarios to,that doesn't mean i will think that durango will perform even close to a GPU that far far surpass it.

Durango like any other GPU have its limits,ESRAM at best case scenario can help the 7770 achieve its peak,not go over that peak,no matter what some people try to paint this the 7770 peak is far far away from the 680GTX peak.

I am sure when all is say and done Durango will not be even close to that GPU,with all its efficiencies.

mrcorbo · Feb 14, 2013

All right. Let me present a plausible and not uselessly favorable-to-Durango scenario.

If you measure the average processing performance of Durango's GPU while running an optimized multiplatform game vs. that of a 680 GTX while running the PC version of that same game with all of the architectural limitations of the software and hardware that make up the PC platform, how might they compare?

expletive · Feb 14, 2013

Shifty Geezer said:
That's a possibility, (32 MBs at 6T per bit is ~1.5 B transistors), although I contend that nVidia would go with lower power for the same performance if it was an option, certainly in the supercomputing space where power consumption is a massive decision making factor to minimise expensive running costs. Have we any confirmation that it's 6T SRAM though? I'm getting lost on what current knowledge is!

I said this in another thread ( i think) but when you are comparing whats in Durango to what is in a whole line of consumer GPUs, doesn't the solution have to be scaleable to all price points? Sure you could build a $600 680GTX with ESRAM but it would be a one off solution in a whole family of graphics that need to hit prices as low as $150. They cant build a high end card, swap out a slower/cheaper memory bus, and chop off CUs to make cheaper versions if relatively high-cost ESRAM is at the heart of the design right?

EDIT: I'm not making performance claims one way or the other, just trying to answer the "why its in Durango and not in Kepler" question.

Love_In_Rio · Feb 14, 2013

mrcorbo said:
All right. Let me present a plausible and not uselessly favorable-to-Durango scenario.

If you measure the average processing performance of Durango's GPU while running an optimized multiplatform game vs. that of a 680 GTX while running the PC version of that same game with all of the architectural limitations of the software and hardware that make up the PC platform, how might they compare?

In actual games 680 would be much faster(x3 times). Next gen game with data crunching algorithms?. Durango would increase performance a lot and get near a 680. Come on!. If MS talks with epic, dice... and if what is inside is a 6t-esram i supposse they made their simulations!. If is not 6t-esram and the chip is 1,8 billion tranies instead of 3... then I will confirm myself MS is now centred in other things.

mrcorbo · Feb 14, 2013

Love_In_Rio said:
In actual games 680 would be much faster(x3 times). Next gen game with data crunching algorithms?. Durango would increase performance a lot and get near a 680. Come on!. If MS talks with epic, dice... and if what is inside is a 6t-esram i supposse they made their simulations!.

The game in my proposed scenario would be developed for next-gen console platforms (Durango & Orbis) and PC.

Love_In_Rio · Feb 14, 2013

mrcorbo said:
The game in my proposed scenario would be game targeting next-gen console platforms (Durango & Orbis) and PC.

Well, Sweeney has just said next consoles games will be similar to the ones in actual high-end pcs, how much is PR fud and how much real?. If it was true -at least talking about UE4- this would bring us that the Orbis special 4 CUs have also something more than a extra alu, but this is another history.
This is a point in which i would like to read opinions of people that have programmed kepler cards and GCN cards to have a hint of the real efficiency of these things.

patsu · Feb 14, 2013

The console version will punch above their weight because developers will optimize the same software better. Plus they have low level access to the GPU. All these latency and bandwidth benefits require developer intervention.

Shifty Geezer · Feb 14, 2013

expletive said:
EDIT: I'm not making performance claims one way or the other, just trying to answer the "why its in Durango and not in Kepler" question.

There are plenty of reasons to feature UberSRAM in Keplar, but I'm looking squarely at Tesla.

Shifty Geezer · Feb 14, 2013

patsu said:
The console version will punch above their weight because developers will optimize the same software better. Plus they have low level access to the GPU.

Durango rumours are saying devs haven't got low-level access. :???:

patsu · Feb 14, 2013

Shifty Geezer said:
Durango rumours are saying devs haven't got low-level access.

Well, perhaps a lower level API than DirectX ?

I also think that workflow is more important this gen because the consoles are rather similar. Any chance we will see heavily baked assets and precalculated data for Durango's 8GB memory ?

Cjail · Feb 14, 2013

Shifty Geezer said:
Durango rumours are saying devs haven't got low-level access.

Who said that?

LightHeaven · Feb 14, 2013

Lucid_Dreamer said:
What's the difference, if the endpoint is the same result?

The difference is that usually the result are not the same. You have a far greater chance at achieving your goal if you plan for something from the very beginning than coming up with late "hacks" to improve a system's performance.

XpiderMX · Feb 14, 2013

Is 7770 confirmed? :?:

Xbox One (Durango) Technical hardware investigation

Love_In_Rio

Shifty Geezer

uber-Troll!

Love_In_Rio

Love_In_Rio

Lucid_Dreamer

Xovek

patsu

Averagejoe

mrcorbo

Foo Fighter

expletive

Love_In_Rio

mrcorbo

Foo Fighter

Love_In_Rio

patsu

Shifty Geezer

uber-Troll!

Shifty Geezer

uber-Troll!

patsu

Cjail

Fool

LightHeaven

XpiderMX

Similar threads