Xbox One (Durango) Technical hardware investigation

Pugger · Feb 4, 2013

shinobi said:
so basically durango memory bandwidth is fine, and won't be bottle necked at all, i thought the Gpu would have a benefit going GDDR5/192gbs. i guess it's fine cause the gpu is 1.2 gflops, if it was 1.6 then it would be kind of starved?

We really don't know enough yet to draw any clear conclusions. With regards to Ram types we have to assume that MS wanted volume (8GB) so that ruled out GDDR5. And the bandwidth for orbis is 176GB/s

Brad Grenz · Feb 4, 2013

Love_In_Rio said:
Well, we could trust that rumor that says that depth and color blocks are more than standard back ends and offer free 4xMSAA and FP16 HDR.

Out of the blue everything would change and the available BW would even seem too much.

ERP, stop the fabs a little more!.

Single cycle 4x MSAA and FP16 are what standard AMD back ends have provided since R770. It's always bandwidth permitting, though, not "free". That's just another example of the tipsters not really understanding the specs.

I think it's very likely all the "special sauce" people were bragging about comes from the parts of the documentation for features Durango has above what the 360 could do. For example, color and z compression. They didn't bother with the 360 since the ROPs had so much internal bandwidth to the embedded memory in that design, but they make a point of explaining its presence and benefits in Durango.

Likewise the embedded memory is now more flexible so they explain the beefed up DMAs they've included to help manage the shifting of data between pools. Since MS renamed every other functional unit in the diagram, we don't have any reason to believe they are much more sophisticated than the DMAs GCN has always relied on, though now there are twice as many. Maybe the small pool of ESRAM means you have to move data around more often, or sharing with the CPU requires more attention there.

Everything else seems pretty bog standard GCN. I guess the audio chip could still be pretty cool.

tunafish · Feb 4, 2013

Gipsel said:
I know it were 8 cycles with the VLIW architectures, but is this documented or measured somewhere for GCN?

Honestly, now that I think of it, I actually do not know. I mostly work on CUDA at the moment, which means I have only written a tiny amounts of code for GCN. I though that that was how it worked, but I honestly can't remember where I learned that, or if I'm just mixing up GCN and older stuff in my head. Once I get home from work I'm going to look it up, or even do a little test.

scently · Feb 4, 2013

Brad Grenz said:
Single cycle 4x MSAA and FP16 are what standard AMD back ends have provided since R770. It's always bandwidth permitting, though, not "free". That's just another example of the tipsters not really understanding the specs.

I think it's very likely all the "special sauce" people were bragging about comes from the parts of the documentation for features Durango has above what the 360 could do. For example, color and z compression. They didn't bother with the 360 since the ROPs had so much internal bandwidth to the embedded memory in that design, but they make a point of explaining its presence and benefits in Durango.

Likewise the embedded memory is now more flexible so they explain the beefed up DMAs they've included to help manage the shifting of data between pools. Since MS renamed every other functional unit in the diagram, we don't have any reason to believe they are much more sophisticated than the DMAs GCN has always relied on, though now there are twice as many. Maybe the small pool of ESRAM means you have to move data around more often, or sharing with the CPU requires more attention there.

Everything else seems pretty bog standard GCN. I guess the audio chip could still be pretty cool.

Indeed. The way I see it, and from what Al was saying in Neogaf, the hardware to provide free 4xMSAA is there but the bandwidth had always been the problem, so by including the eSRAM, it will provide the bandwidth and framebuffer to take advantage of the 4xMSAA hardware on ground. The choice of eSRAM is perculiar though as for the same budget, they could have gotten like 4 times the size in eDRAM, so the advantages of SRAM is probably very key here ie low latency, data refresh rate, etc.

Prophecy2k · Feb 4, 2013

scently said:
Indeed. The way I see it, and from what Al was saying in Neogaf, the hardware to provide free 4xMSAA is there but the bandwidth had always been the problem, so by including the eSRAM, it will provide the bandwidth and framebuffer to take advantage of the 4xMSAA hardware on ground. The choice of eSRAM is perculiar though as for the same budget, they could have gotten like 4 times the size in eDRAM, so the advantages of SRAM is probably very key here ie low latency, data refresh rate, etc.

I think we've been through this too many times on this forum. You should check posts by Hornet, myself and others in the "Predict a Next Gen..." thread (now locked).

Durango's ESRAM will be 1T-SRAM, which is more closely related to eDRAM than actual SRAM (6T or 6 transistors per cell). 32MB of (6T)SRAM would be rediculous and infeasible, thus the only reasonable option is 1T-SRAM, aka the same type used on the gamecube.

It'll be denser than eDRAM, availble for manufacturing on a 28nm process and possible to be all on the same die as the other components.

scently · Feb 4, 2013

Prophecy2k said:
I think we've been through this too many times on this forum. You should check posts by Hornet, myself and others in the "Predict a Next Gen..." thread (now locked).

Durango's ESRAM will be 1T-SRAM, which is more closely related to eDRAM than actual SRAM (6T or 6 transistors per cell). 32MB of (6T)SRAM would be rediculous and infeasible, thus the only reasonable option is 1T-SRAM, aka the same type used on the gamecube.

It'll be denser than eDRAM, availble for manufacturing on a 28nm process and possible to be all on the same die as the other components.

Yeah, this is a rumor thread and until official specs comes out you can't say for sure. What you guys said is just speculation, just as much as what I said. And from what you are saying, it is still denser than eDRAM. Infact what I am saying is more probable that Orbis shipping with 8gb GDDR5 RAM, which I think is unrealistic but as far as they are still rumors the speculation is acceptable.

Arwin · Feb 4, 2013

Shifty Geezer said:
Think of it like a PC with 68 GB/s main memory and 102 GB/s VRAM.

Well, except that very few GPU's have only 32MB of VRAM

. In that respect, it is almost more a large and fast GPU cache memory/scratchpad?

Also, I wonder if the GPU is the 'boss' of the DDR3 as well again just like it was in the 360, if I remember correctly?

At any rate, certianly the 32MB of VRAM, however small, still adds a lot of bandwidth potential to the system.

Love_In_Rio · Feb 4, 2013

More wood ( there are three pages of info ):

http://www.vgleaks.com/durango-gpu/

Clock rate
800 MHz
Compute
Shader cores
12
Instruction issue rate
12 SCs * 4 SIMDs * 16 threads/clock = 768 ops/clock
FLOPs
768 ops/clock * (1 mul + 1 add) * 800 MHz = 1.2 TFLOPS
Interpolation
( 768 ops/clock / 2 ops ) * 800 MHz = 307.2 Gfloat/sec
Geometry
Triangle rate
2 tri/clock * 800 MHz = 1.6 Gtri/sec
Vertex rate
2 vert/clock * 800 MHz = 1.6 Gvert/sec
Vertex/buffer fetch rate (4 bytes)
4 elements/clock * 12 SCs * 800 MHz = 38.4 Gelement/sec
Vertex/Buffer data rate from cache
38.4 Gelements/sec * 4 bytes = 153.6 GB/sec
Memory
Peak throughput from main RAM
68 GB/sec
Peak throughput from ESRAM
128 bytes/clock * 800 MHz = 102.4 GB/sec
ESRAM size
32 MB
GSM size
64 KB
LSM size
12 SCs * 64 KB = 768 KB
L2 cache size
4 x 128 KB = 512 KB (shared)
Texture
Bilinear fetch rate (4 bytes)
4 fetches/clock * 12 SCs * 800 MHz = 38.4 Gtexels/sec
Bilinear data rate from cache
38.4 Gtexels/sec * 4 bytes = 153.6 GB/sec
L1 cache size
16 KB/SC * 12 SCs = 192 KB (nonshared)
Output
Color/depth blocks
4
Pixel clear rate
1 8×8 tile/clock * 4 DBs * 800 MHz = 204.8 Gpixel/sec
Pixel hierarchical Z cull rate
1 8×8 tile/clock * 4 DBs * 800 MHz = 204.8 Gpixel/sec
Sample Z cull rate
16 /clock * 4 DBs * 800 MHz = 51.2 Gsample/sec
Pixel emit rate
4 /clock * 4 DBs * 800 MHz = 12.8 Gpixel/sec
Pixel resolve rate
4 /clock * 4 DBs * 800 MHz = 12.8 Gpixel/sec

Hecatoncheires · Feb 4, 2013

There's also page 2 and 3

Love_In_Rio · Feb 4, 2013

Interesting tidbits about memory:

Virtual Addressing

All GPU memory accesses on Durango use virtual addresses, and therefore pass through a translation table before being resolved to physical addresses. This layer of indirection solves the problem of resource memory fragmentation in hardware—a single resource can now occupy several noncontiguous pages of physical memory without penalty.

Virtual addresses can target pages in main RAM or ESRAM, or can be unmapped. Shader reads and writes to unmapped pages return well-defined results, including optional error codes, rather than crashing the GPU. This facility is important for support of tiled resources, which are only partially resident in physical memory

ESRAM

Durango has no video memory (VRAM) in the traditional sense, but the GPU does contain 32 MB of fast embedded SRAM (ESRAM). ESRAM on Durango is free from many of the restrictions that affect EDRAM on Xbox 360. Durango supports the following scenarios:

Texturing from ESRAM
Rendering to surfaces in main RAM
Read back from render targets without performing a resolve (in certain cases)

The difference in throughput between ESRAM and main RAM is moderate: 102.4 GB/sec versus 68 GB/sec. The advantages of ESRAM are lower latency and lack of contention from other memory clients—for instance the CPU, I/O, and display output. Low latency is particularly important for sustaining peak performance of the color blocks (CBs) and depth blocks (DBs).

Hornet · Feb 4, 2013

I wonder if these are standard features already available in Cape Verde/Pitcairn/Tahiti.

Unlike some earlier GPUs (including the Xbox 360 GPU), Durango leaves texture and buffer data in native compressed form in the L2 and L1 caches. [...]
To see how this policy affects cache efficiency, consider an sRGB BC1 texture—perhaps the most commonly encountered texture type in games. BC1 is a 4-bit per texel format; on Durango, this texture occupies 4 bits per texel in the L1 cache. On Xbox 360, the same texture is decompressed and gamma corrected before it reaches the cache, and therefore occupies 8 bytes per texel, or 16 times the Durango footprint. For this reason, the Durango L1 cache behaves like a much larger cache when compared against previous architectures.

The Durango GPU supports 2x, 4x, and 8x MSAA levels. It also implements a modified type of MSAA known as compressed AA. [...]
Traditionally, coverage samples and surface samples match up one to one. [...]
Under compressed AA, there can be more coverage samples than surface samples.[...]
Compressed AA combines most of the quality benefits of high MSAA levels with the relaxed space requirements of lower MSAA levels.

Hecatoncheires · Feb 4, 2013

In my eyes it would have been a much better solution for everyone if they had just listed the differences between the Durango GPU and a common GCN GPU instead of this text overkill! :yep2:

sir doris · Feb 4, 2013

Hecatoncheires said:
In my eyes it would have been a much better solution for everyone if they had just listed the differences between the Durango GPU and a common GCN GPU instead of this text overkill!

But then VGLeaks may have found they had very little content lol

Gipsel · Feb 4, 2013

Hornet said:
I wonder if these are standard features already available in Cape Verde/Pitcairn/Tahiti.

The first one, of course. I'm not completely sure for the second one, but I would think so, too.

fellix · Feb 4, 2013

Hornet said:
I wonder if these are standard features already available in Cape Verde/Pitcairn/Tahiti.

All of those are standard features of GCN. Coverage sampling was actually featured way back in the HD6900 series.

Hornet · Feb 4, 2013

fellix said:
All of those are standard features of GCN. Coverage sampling was actually featured way back in the HD6900 series.

I see. Other than ESRAM, it looks like a pretty standard GCN GPU. The only good thing I can think about this design is that it is likely going to be cheap, unless they bundle some expensive peripheral in every box. This SoC should be less than 250mm^2, with a TDP of less than 120 W. Considering the motherboard layout will also be simpler than a launch Xbox 360, they could launch at 300$ without breaking the bank.

ultragpu · Feb 4, 2013

So what exactly that is differentiating a Shader Core to a common GCN Compute Unit?

Love_In_Rio · Feb 4, 2013

ultragpu said:
So what exactly that is differentiating a Shader Core to a common GCN Compute Unit?

F.U.D.

ultragpu · Feb 4, 2013

Love_In_Rio said:
F.U.D.

There has to be some merit to it..just how much can we trust VGleaks?

Scott_Arm · Feb 4, 2013

ultragpu said:
So what exactly that is differentiating a Shader Core to a common GCN Compute Unit?

Looking at specs, it doesn't seem like there is a difference. It's just listed with a different name on the block diagrams, whereever those came from. Seems like it's a GCN GPU, just a lower-end one than is in Orbis.

Xbox One (Durango) Technical hardware investigation

Pugger

Brad Grenz

Philosopher & Poet

tunafish

scently

Prophecy2k

scently

Arwin

Now Officially a Top 10 Poster

Love_In_Rio

Hecatoncheires

Love_In_Rio

Hornet

Hecatoncheires

sir doris

Gipsel

fellix

Hornet

ultragpu

Love_In_Rio

ultragpu

Scott_Arm

Similar threads