Xbox One (Durango) Technical hardware investigation

Cranky · Sep 24, 2013

Shifty Geezer said:
You misread
bkilian is saying MS's maths is off

actually that is not what he is saying. He just did some algebra to solve for a scaling factor that fit. He then applied said scaling factor to 18 CU.

see below for chart of impact of scaling factor

Scaling factor12 CU18 CUOutput 12Output 18Difference
0.990.890.8310.6415.0241%
0.980.780.709.4212.5133%
0.970.690.588.3310.4025%
0.960.610.487.358.6317%
0.950.540.406.487.1510%

for clarity sake the differences as 41%, 33%, 25%, 17%, and 10% from .99 to .95 scaling efficiency

Gipsel · Sep 24, 2013

mosen said:
7870 XT = 24 CUs ==> 24/16= 1.5 ...

Aah, you mean the Tahiti LE which Sapphire calls 7870XT (it's not an official name afaik, other companies have other names).

blakjedi · Sep 24, 2013

Averagejoe said:
Unified shaders didn't show up on PC well after the xbox 360 was launch and MS could not stop talking about it,and hyping how powerful the GPU was i remember it perfectly.

This is vaporware i think to keep the argument about they having a console on par with the competition which i don't think they have.

Thats a cynical view imho.

In reality the question is, what is there left to talk about within the Xbox One architecture which requires an NDA well into September?

dumbo11 · Sep 24, 2013

blakjedi said:
Thats a cynical view imho.

In reality the question is, what is there left to talk about within the Xbox One architecture which requires an NDA well into September?

Given the state of the DX11.2 thread, the logical candidate is "tier 2 tiled resources".

Betanumerical · Sep 24, 2013

dumbo11 said:
Given the state of the DX11.2 thread, the logical candidate is "tier 2 tiled resources".

Aren't there already shipping products with Tier 2 PRT.

dumbo11 · Sep 24, 2013

Betanumerical said:
Aren't there already shipping products with Tier 2 PRT.

Yes, most likely the 7790...

Cranky · Sep 24, 2013

OK - how about some more fun with Math!

If the current scaling of CU is .98 then the difference between 12 and 14 CUs is 12%. - This is purely hypothetical in an attempt to illustrate how MS' assertion that 6% was better than 2 more CU could be possible.

However, if the increased clock mitigated some of the bottlenecks from ROPS, Primatives, bandwidth etc such that the scaling improved from .98 to .985 then 12 CU has about a 1% edge with the improved clock over 14 CU at the old clock.

mosen · Sep 24, 2013

blakjedi said:
Thats a cynical view imho.

In reality the question is, what is there left to talk about within the Xbox One architecture which requires an NDA well into September?

EG said that they will speak about X1 GPGPU and Microsoft vision (importance of memory latency) later. So maybe they have NDA on GPGPU or maybe there is nothing at all.

dobwal · Sep 24, 2013

The CU difference in and of itself means very little. MS has already expressed that Durango has a bunch of special processors that offload some of tasks typically deployed on the CUs and cpus. Durango's and Orbis's CUs may be pretty much the same (outside of some minor tweaks) but the Durango's CUs seems to be responsible for a narrower set of work tasks.

You lose the flexibility that the CU's programmable logic provides, but ASICs specialized for specific tasks are usually more energy efficient, tend to provide greater performance and tend to be cheaper transistor wise. The question is whether you can keep them busy enough so that the inability or limited ability to repurpose/reprogram them isn't a detriment to their overall cost to the silicon budget.

If your only concern is performance then adding CUs make sense, but if energy efficiency and minimal noise are as important as performance then task specific logic that can be heavily utilized makes greater sense.

Whether how much the CU gap matters is also a question of whether Durango really benefits from the other not readily described logic.

Betanumerical · Sep 24, 2013

dobwal said:
The CU difference in and of itself means very little. MS has already expressed that Durango has a bunch of special processors that offload some of tasks typically deployed on the CUs and cpus. Durango's and Orbis's CUs may be pretty much the same (outside of some minor tweaks) but the Durango's CUs seems to be responsible for a narrower set of work tasks.

What ASIC's does Durango contain too offload work from the CU's?.

GravityX · Sep 24, 2013

Nick is saying that "to get all of this processing out of the box" is a challenge too.The new CPU core can do six CPU operations per core per cycle, on an eight-core CPU." - 48 ops total.

http://www.engadget.com/2013/05/21/xbox-one-architecture-panel-liveblog/

Some performance numbers were given for the CPU and GPU themselves but these cast more shadow than they do light. Microsoft claimed that each CPU core can perform six operations per cycle. The CPU is believed to be using AMD's Jaguar core, but typically this would only be described as able to handle four operations per cycle

http://arstechnica.com/gaming/2013/...xbox-ones-internals-while-disclosing-nothing/

I've been told this incorrect. That Jaguar CPU cores can only perform 4 ops per core.

Could MS have modified the Jaguar cores to perform 6 ops per core?

My apologies if this is a dumb question.

DrJay24 · Sep 24, 2013

The Move engines will save on GPU overhead for data copying, right?

dobwal · Sep 24, 2013

Betanumerical said:
What ASIC's does Durango contain too offload work from the CU's?.

The sound, compression, swizzle and the other logic blocks described in Durango outside of the core logic that represents the cpu and the gpu.

Betanumerical · Sep 24, 2013

dobwal said:
The sound, compression, swizzle and the other logic blocks described in Durango outside of the core logic that represents the cpu and the gpu.

People generally do LZ decompression on the GPU?, also sound on the GPU?, I don't think texture swizzling (which from my knowledge exists for free on modern GPU's?) is a big win either.

The only thing that Orbis seems to be lacking in regards to Durango is the SHAPE audio block and even then it still has a audio decompressor.

3dilettante · Sep 24, 2013

GravityX said:
I've been told this incorrect. That Jaguar CPU cores can only perform 4 ops per core.

Could MS have modified the Jaguar cores to perform 6 ops per core?

My apologies if this is a dumb question.

Jaugar has two integer pipes, two FP pipes, a load pipe, and a store pipe.
That's the most straightforward interpretation of the 6 ops claim.

The four op claim might reflect that the two-wide front end can decode two reg-mem ops that decompose into an ALU and memory operation, which can mean up to four issue ports can be used at the same time.

The core's back end is wider than the sustainable instruction bandwidth to allow it to catch up after stalls.

Solarus · Sep 24, 2013

Betanumerical said:
People generally do LZ decompression on the GPU?, also sound on the GPU?, I don't think texture swizzling (which from my knowledge exists for free on modern GPU's?) is a big win either.

The only thing that Orbis seems to be lacking in regards to Durango is the SHAPE audio block and even then it still has a audio decompressor.

well that and potentially one less display plane which mean if ps4 is capable of doing dynamic resolution scaling in hardware its going to scale both the game and ui so itll be more noticeable.

Betanumerical · Sep 24, 2013

Solarus said:
well that and potentially one less display plane which mean if ps4 is capable of doing dynamic resolution scaling in hardware its going to scale both the game and ui so itll be more noticeable.

Or the game could composite the UI itself, does it really take that much power to do 1 1080P blend at 60FPS?.

McHuj · Sep 24, 2013

I'm pretty sure that X1 has additional DMA's to manange the SRAM. 4 DMA's vs 2 in GCN.

Since the SRAM is not a harware cache, you'll have to explicitly copy data in and out, it's the perfect task for a DMA to perform. Even if it's minimal cycles, it's not something you want to waste compute time on.

If the DMA can compress, decompress, and/or perform any additional (simple) data transforms that's a benefit.

I've always wondered if they would employ some sort of tiling mechanism for the render targets. If you split the frame into tiles, you should be able to perform operations on the current tile, while the DMA engines bring in the next tile and write out the previous result. It's a technique that works well for things like image processing where you can basically hide all the data movement since it occures in parallel to the processing.

Betanumerical · Sep 24, 2013

McHuj said:
I'm pretty sure that X1 has additional DMA's to manange the SRAM. 4 DMA's vs 2 in GCN.

Since the SRAM is not a harware cache, you'll have to explicitly copy data in and out, it's the perfect task for a DMA to perform. Even if it's minimal cycles, it's not something you want to waste compute time on.

If the DMA can compress, decompress, and/or perform any additional (simple) data transforms that's a benefit.

Every DMA can do texture swizzling, 1 can do JPEG/LZ decoding and 1 can do LZ encoding. The JPEG decoding is used for Kinect 2 it seems.

bkilian · Sep 24, 2013

Betanumerical said:
Or the game could composite the UI itself, does it really take that much power to do 1 1080P blend at 60FPS?.

Well, you also then have to do the scale from render buffer size to screen size yourself, instead of letting the hardware do it for you when it composites the planes. That would probably mean a lower quality scale, and CU time. Not just a simple blend.

Xbox One (Durango) Technical hardware investigation

Similar threads