Xbox One (Durango) Technical hardware investigation

anexanhume · Jan 21, 2013

Rangers said:
Well, if it can move the needle to Durango GPU is "like" 1.5 TF instead of 1.2, I think thats all it needs, combined with more RAM.

What I find a bit odd is why MS spent all this engineering time and effort over just adding a few more CU's. Nothing's more expensive these days than people. They'll quickly get more expensive than silicon.

It would seem like MS/AMD would have expended a lot of effort on these "DMA engines" and it's kind of weird.

My guess is that the angle they have is that cases where the GPU would be underutilized, this DME is helping it keep peak throughput, in some cases surpassing what a much larger GPU would do. Second, as workflows evolve and change over time, the fact that this architecture is no longer optimal shows less. Finally, devs will have it easier because they don't have to fudge methods to optimize. It should just work.

However, I still agree with your question on all the NRE this causes. Unless this silicon is going to make more people make exclusive games and content or its going to greatly simplify 1 party games and allow them to make more and/or lower their costs, what's the point?

Rangers · Jan 21, 2013

Bagel seed said:
I guess these DME pieces might be what's giving them a pseudo HSA environment in the absence of being able to wait for AMD's parts that'll have it built in.

Aegis posted that HSA is mentioned a lot in the documentation...

Johnny_Physics · Jan 21, 2013

So according to the diagram the esram has a 102GB/s interface to the GPU and... the same to the north bridge?

ERP · Jan 21, 2013

I should have said earlier if it's really SDRAM, and therefore very low latency, having something to block copy data to it makes a lot of sense.

If the you know the GPU will be reading from a buffer in a cache unfriendly way, moving that buffer to the low latency memory would dramatically improve the utilization of the CU's. In some cases more so than having significantly more CU's.

Bagel seed · Jan 21, 2013

ERP said:
If I had to guess, the decision to use DDR3 and a fast memory pool probably happened very early on,and the DMA engines are specifically to deal with moving data between the pools.
I would bet they correlate to perceived pain points developers had dealing with the two pools on 360.

Sounds about right. Making the memory setup feel more like a unified one basically.

liolio · Jan 21, 2013

Pretty much it sounds like they re-hacking through what we heard so far.
I think that the information from Aegis are correct because that is what makes sense to me.

Why would MSFT use a 256bit bus providing quiet some bandwidth if the GPU is not to use to render?
Trinity does fine with half of that for both textures and rendering, with the scratchpad memory offering a place to do really bandwidth intensive rendering operation why go for a 256 bit bus?
The answer is imo the GPU can render whever it wants:
on board scratchpad memory or main.

Another things MSFT may have wanted is to avoid as much as possible moving around significant amount of data. I expect the final frame buffer to be output form the GPU without having to be written in ram.

Imho it looks like a really good and flexible set-up.

Parallel to that isthe fact that the GPU as in the 360 acts as the north-bridge, and is connected to the main RAM.

I would bet the GPU is just below the 185mm^2 cap and that they have just enough room to accomodate the 256 bit bus. I don't think MSFT plan to shrink the GPU anytime soon and decide (as I hope Sony would do in their place) with something they are comfortable with (wrt production costs) from scratch.

With 8 jaguar and their L2 being most likely below 80mm2 (could be a bit more if there is some special sauce in it), the use of standard and cheap RAM, they might have at hand something that can be priced really competitively.

I think that a core system, without kinect or HDD and a sane amount of Flash, could cheap at a really tempting price at launch. They are also likely to be able to produce the system in a pretty healthy quantity.

Overall I would say it looks "good", and seeing some people defending the system when a few people here (including me) that were defending the idea that those systems could pretty conservative (an APU+hd6670 pretty much provide the same peak figures as far as FLOPz are concerned) were mocked is amusing

Laa-Yosh · Jan 21, 2013

32 MB is a good sized buffer for 1920x1080 resolution, or can provide space for twice as many buffers at 720p resolution.
Or, in the more probable case with the Wii U, it can also act as a texture cache to compensate for the slow main memory.

DDR3 is probably used because it's cheap. Bandwidth of the memory bus is quite a lot more important than the exact memory type, although that has a significant effect on memory latencies as well.

It is in no way comparable to the Wii U hardware and if you don't understand that, you should probably keep on reading instead of posting or attacking people who have a much better understanding of this than you.

HolySmoke · Jan 21, 2013

Rangers said:
The same 1.6 ghz CPU rumored to be in PS4?

And hmm, 8GB DDR3 on a 256 bit bus vs 2GB on a 64 bit bus...

Besides everything else.

With people talking up the DMA engines now too, this will run circles around Wii U and probably compete with Orbis (hence lherre's comment they are close like PS3/360)

Am I wrong in thinking a 256-bit bus sounds prohibitively expensive for a console? I would have thought 128-bit GDDR5 were both less expensive and easier to manufacture (as well as plain faster) than such a config. Especially in the long term.

Aeoniss · Jan 21, 2013

As specs go, this thing looks fairly underwhelming though. I was just expecting... More. Then again, I guess Nintendo made an indelible mark last generation.

So there are four of these Data Move Engines?

I'm holding out hope that Orbis is a bit more sophisticated, but I'm not sure what kind of financial position Sony is in to really put the kind effort into it that they did with the PS3.

Rangers · Jan 21, 2013

proelite seems to know about these DMA engines on GAF, and he's throwing around that Alpha kits has over a 2.5 TF GPU just to emulate the final system performance...

for whatever thats worth...likely not much.

Once again there was a big gulf in terms of alpha kits and beta kits for Durango on paper. The Alpha kits were needed to emulate the final performance of the combined silicon.

Alpha kits were

8 core 16 thread intel CPUs, probably running at 1.6 ghz
12GB of Ram
High end AMD 7000 series >2.5 teraflop GPU

he also said this

GPU's are not the same.

There is a reason MS is calling their units shader cores and not compute units.

They're maximized for graphics.

starting to get a little unbelievable...

I feel like a lot of people see that 1.2 teraflop figure and then immediately underestimate the Durango.

It'll take a 2.5 teraflop or better GPU on the PC to match this highly customized GPU, which has customizations that will probably never be ported to PCs for the reason that PCs don't need them.

Karamazov · Jan 21, 2013

does the dsp means we'll finally get great car sounds in racing games ?

anexanhume · Jan 21, 2013

Karamazov said:
does the dsp means we'll finally get great car sounds in racing games ?

Riiiiiiidgggeee Racer!

Rangers said:
proelite seems to know about these DMA engines on GAF, and he's throwing around that Alpha kits has over a 2.5 TF GPU just to emulate the final system performance...

for whatever thats worth...likely not much.

he also said this

starting to get a little unbelievable...

Well, he also claimed that Orbis and Durango had heavily customized GCN or GCN2 cores and they were certainly not the same.

JasonLD · Jan 21, 2013

Rangers said:
proelite seems to know about these DMA engines on GAF, and he's throwing around that Alpha kits has over a 2.5 TF GPU just to emulate the final system performance...

for whatever thats worth...likely not much.

he also said this

starting to get a little unbelievable...

I don't see Data Move Engines being anything more than silicon dedicated to bring up the efficiency of the whole system processing load. Probably should not take much of the die space either.

liolio · Jan 21, 2013

Laa-Yosh said:
32 MB is a good sized buffer for 1920x1080 resolution, or can provide space for twice as many buffers at 720p resolution.
Or, in the more probable case with the Wii U, it can also act as a texture cache to compensate for the slow main memory.

DDR3 is probably used because it's cheap. Bandwidth of the memory bus is quite a lot more important than the exact memory type, although that has a significant effect on memory latencies as well.

It is in no way comparable to the Wii U hardware and if you don't understand that, you should probably keep on reading instead of posting or attacking people who have a much better understanding of this than you.

It seems they are going with 32MB, but I could see them doing every bit as well with less.
Looking at something that seems to have be design with production costs as a leading constrain, I think they fit as much SRAM as they could on the GPU die while remaining below 185mm^2.

It could be 32MB but 24MB would be just as fine.
32MB allows for a 1080P 2xAA, but more and more engines use deferred rendering now, you're, by the way, a lot more aware of that matter of fact than I am.
Be it 32MB or 24MB you won't fit a G-buffer @1080p in there with any kind of AA.
24MB is enough to fit a tight" G-buffer ala Crisis or Trial HD.
Actually if needed the G-buffer could be rendered in the main RAM.

I could see, but you may elaborate

, some render targets (transparencies/particle and what not) being rendered in the scratchpad (at 1080p that's 15MB worth of data?), and then staying there. Then the GPU would read tiles of the G-buffer (from the main ram) and do the blending in the scratchpad.
If there is post process to do the result would be at hand /in the scratchpad.

The "end result" (/frame buffer) would be in the scratchpad and send to the display without having to copy it back the main ram as in the 360.

Laa-Yosh · Jan 21, 2013

Truth be told I don't expect to see that many games with deferred rendering and multisampling used at the same time. I actually don't expect to see too many 1920x1080 games at all

720p is fine enough for most people and it's a far better use of resources to spend that fillrate on something else.

liolio · Jan 21, 2013

Rangers said:
proelite seems to know about these DMA engines on GAF, and he's throwing around that Alpha kits has over a 2.5 TF GPU just to emulate the final system performance...

for whatever thats worth...likely not much.

he also said this

starting to get a little unbelievable...

I would not find it too incredible if actually MSFT vouched for a reworked VLIW5 architecture.
It was cheap as far as silicon was concerned and for graphic it worked.

There is something else in my mind, with scratchpad (I assume on the GPU die), I could see MSFT having asked for something that could ease BC, I wonder if vliw5 would make things easier than Vliw4 and down the road a scalar approach in GCN, as Xenon is 4+1 SIMD.
Possibly a brain fart of mine.

All this stuff about DMA engine got me to think that it could be overblown, what part of Xenos and the daughter die was in charge of moving the data from the daughter die to main ram, there migh have been something doing it. Xenos (as the shader cores) could not read from the edram, still the GPU (or something somewhere) initiated the copy for the EDRAM to the main RAM. It was not the "GPU" as such otherwise you wonder why it could not have simply read straight form the edram?

Even if the scratchpad is on die you may need something to move data from the scratchpad to the ram and "vice et versa". Usually GPU read and write stuffs from the main ram and that's it.
I guess there is a need for something to make the whole thing happen conveniently but it could look to me more like 'technical requirement" than a design win.
The design win is having two pools where you can render, one which is both low latency and high bandwidth, and overall quiet some bandwidth to play with (68+100 GB/s).

patsu · Jan 21, 2013

jonabbey said:
Still not sure I understand the utility of a data move engine. Unless there is some kind of dual porting on the DDR3, having a DMA engine isn't going to save any on bandwidth. Were 360 developers having to do memory copies with the CPU or GPU taking compute performance away in any meaningful way?

Edit: That last neogaf quote let does suggest it's a matter of reducing processor loading, so that makes sense.

Should be able to DMA in-contiguous memory to the fast RAM like Cell's DMA engine.

The GPU is supposed to have fixed functions for some high level operations, no ?

anexanhume · Jan 21, 2013

aegis claims there's a block missing from the diagram:

Looking at that VGleaks diagram again, I think they fucked up some f the custom hardware stuff. Specifically, there's a hardware block that they don't describe, pertaining to... well. I don't want to get anybody in trouble.

HolySmoke · Jan 21, 2013

proelite said:
There is a reason MS is calling their units shader cores and not compute units.

They're maximized for graphics.

That's obviously just nonsense.

I'm still curious if anyone knows the cost differential between 256-bit DDR3 and 128-bit GDDR5. Both in terms of performance and actual money. I think the believability of this rumor hinges on it and I don't find it too plausible.

Shifty Geezer · Jan 21, 2013

Laa-Yosh said:
32 MB is a good sized buffer for 1920x1080 resolution, or can provide space for twice as many buffers at 720p resolution.

At 1080p, subsampled particles will be still high quality, and deferred rendering could make shortcuts. eg. Render effectively a 422 YUV image by rendering light at 1080p and albedo etc. at 720p.

JasonLD said:
I don't see Data Move Engines being anything more than silicon dedicated to bring up the efficiency of the whole system processing load. Probably should not take much of the die space either.

We're very clueless here. I think it's wrong to assume they are DMA units as they'd be called DMA units in that case.

There must be more they are doing with the memory, although what, we can only guess. I assume scaling and filtering of sorts, saving the GPU. Mipmaps and compressed textures and who-knows-what. My custom hardware thread didn't really uncover any clear, obvious uses that convinced me of their value, but here we are with them reportedly in, so they must have some noteworthy value, at which point I feel we should explore all the possible uses and see what they could bring to the table.

Xbox One (Durango) Technical hardware investigation

anexanhume

Rangers

Johnny_Physics

ERP

Bagel seed

liolio

Aquoiboniste

Laa-Yosh

I can has custom title?

HolySmoke

Aeoniss

Rangers

Karamazov

anexanhume

JasonLD

liolio

Aquoiboniste

Laa-Yosh

I can has custom title?

liolio

Aquoiboniste

patsu

anexanhume

HolySmoke

Shifty Geezer

uber-Troll!

Similar threads