Xbox One (Durango) Technical hardware investigation

Status
Not open for further replies.
It can't be on top: the chip is labelled as being diffused in Taiwan. They'd have to sneak their alleged 22nm IBM SOI GPU under it, unless we want to add international trade dispute to our rumored Durango feature set.

I think the whole "22nm IBM SOI" thing comes from an old 720 product pitch/specs leak from 2011 which also had the APU/dGPU concept.
 
I'm not sure where complexity figures into it... In the demo, the person on the tablet was purported to be in a remote location. Perhaps commuting home on a bus. You don't pipe graphics over a 3G connection. You just don't. Even piping it out over a typical home upsteam network connection is a little unbelievable.

The least-crazy explanation is that the tablet is running a little "The Division Lite" app, and it is drop-in connecting to the same network match as the other players. (An interesting concept. I hope it actually happens.)

[EDIT] Sorry, I didn't realize I was so many pages behind in this thread.

I'll add a question: I saw the term "diffused" printed on the XB1 SOC. Someone in this thread also used it. What does it mean? Is it a fabrication step?

http://www.ti.com/corp/docs/manufacturing/howchipmade.shtml

I had to look when I saw that.
 
I'm not sure where complexity figures into it... In the demo, the person on the tablet was purported to be in a remote location. Perhaps commuting home on a bus. You don't pipe graphics over a 3G connection. You just don't. Even piping it out over a typical home upsteam network connection is a little unbelievable.

I didn't realize 3G was part of this but always assumed it's all about a home wlan. But then as the game is a MMO anyway doing client apps for pads makes sense.
 
Its been mentioned many times before, and most of the applicants on that patent are the architects publicly speaking about Xbox One.. and one of those applicants, Mark Grossman is a partner architect for Microsoft that does work with AMD/ATI and is listed in a couple of ATI patents..

Of course none of this guarantees that the Xbox One is that design... just all speculation

p.s. don't read too much into that patent especially the diagrams that clearly show multiple gpu's ... that argument has been made many times before :)
Mark used to work at ATI so if he's on an ATI patent that would be why.
 
I'll add a question: I saw the term "diffused" printed on the XB1 SOC. Someone in this thread also used it. What does it mean? Is it a fabrication step?

Diffusion is a chemical process and is among the steps in fabrication. In the case of die labeling, it is being used to denote where the chip was fabbed. The foundry is TSMC and the line in question is in Taiwan. The result is patterned silicon that is a component that is part of what is an assembled chip package.

The line below it saying Malaysia indicates where the component die is combined with its substrate and other package components.

GF chips from Dresden say diffused in Germany, for example.
I think Intel has some kind of code for its fabs these days.
Both can have chips mentioning Malaysia.
There are other locations as well. Intel has had Costa Rica and Philippines before.

Given the particular needs of these chips, it can be more cost effective to ship the silicon across the ocean so that a plant that specializes in packaging the die in volume can finish the work.
 
New info ish? Half the SOC taken by caches?

http://semiaccurate.com/2013/09/03/xbox-ones-sound-block-is-much-more-than-audio/

In the end what do we have? A 5+ billion transistor SoC built on TSMC’s 28nm HP process. The 361mm^2 die size, ~381mm^2 counting scribe area, lines up nicely with similar transistor count AMD GPUs made on the same process. Unfortunately no TDP was disclosed or even hinted at. All the caches on the die total up to about 47MB so that will dominate the SoC’s area, about half the die according to the architect, much of which is distributed to the various units.

Hope Ms knows what they're doing with that.

Assuming it's accurate since it's just Charlie recounting a conversation, and Charlie articles tend to be riddled with inaccuracies. But this sounds reasonable.
 
AMD notably has packaging facilities in Malaysia.

And Intel too- at Bayan Lepas. Free trade zone, which makes Malaysia competitive enough for final assembly and packaging.

But in this case, I wonder who's dong that part; the XO die is TSMC made...
 
I think the guy use "cache" pretty loosely.
Anyway there are a lot memory in any modern design be it CPU or GPU.
In the CPU you have the L1&2 (at this point he may as well in account the various register banks), in the GPU (huge banks of registers within the CU, the L1, GDS, L2, ROPs caches and so on.
Looking at Kabini die, one can see that the L2+the L2 interface is almost as big as the four jaguar cores.
Blend in 32MB of eSRAM (+redundancy), then the memory included in the specialized units, well half the chip is memory cell ain't crazy at all. ALUs are tiny I don't think there are the main contributor in die size of any design be it modern CPU or GPU.
It was pretty obvious looking at the AMD old SIMD, the register banks were taking an healthy amount of space within the SIMD.
 
Who thinks twice? The mods? It's remarkable how many people seem to misunderstand entirely why they got their infractions, and then go complaining in the middle of some discussion. People posting unsubstantiated rumours as facts in a discussion not about the rumours during silly season can expect to be reprimanded for being OT. Doesn't matter who your source is, unless you a proven developer working on HW known to be telling the truth, you can't post rumour as fact in discussions because of how that affects the discussion.

Well i didn´t try to be offensive to de admins, just pointing out that for its track record he deserved some credibility, and i fully understand the difference between some unsubstantiated rumours passing as facts.

As you say, if it was people like Bkillian or Dave spilling some beans, they wouldn´t have been warned, but you know that they wouldn´t risk breaking those NDAs

And that is the sad thing of this threads, people who really know can´t talk freely, and we´re left with rumours.

Ate least we´ve got good reading about the feasibility of this and that.

That eastmen's upclock rumour turned out be true doesn't justify him posting rumour as fact, nor does it give him carte blanche to post his sources as fact in general discussion. How exactly is one supposed to know which posters are allowed to post their rumours as facts such that the mods can know when to think twice about removing rumours from a discussion outside of the rumour discussion deliberately created to manage that side of discussion?

This is of course OT, but if I removed the OT remarks I'd be accused of censorship and silencing the dissenters.
Yes we´re going OT
Sorry for that
 
32MB of ESRAM are huge.

I'm still not sure where all the other 15MB are.

CPU around 4.5MB(2*2MB L2,8*64KB L1,? TLB size,? registers)
Audio dsp (64kb sram, 24kb+3*48kb)
GPU 3*4*(4*64KB Registers+16KB L1) + 3 * 48KB K$/I$ + 4*128kb L2 is about 4MB

which leaves 5MB undiscovered land. Any input here?
 
Last edited by a moderator:
I think the guy use "cache" pretty loosely.
Anyway there are a lot memory in any modern design be it CPU or GPU.
In the CPU you have the L1&2 (at this point he may as well in account the various register banks), in the GPU (huge banks of registers within the CU, the L1, GDS, L2, ROPs caches and so on.
Looking at Kabini die, one can see that the L2+the L2 interface is almost as big as the four jaguar cores.
Blend in 32MB of eSRAM (+redundancy), then the memory included in the specialized units, well half the chip is memory cell ain't crazy at all. ALUs are tiny I don't think there are the main contributor in die size of any design be it modern CPU or GPU.
It was pretty obvious looking at the AMD old SIMD, the register banks were taking an healthy amount of space within the SIMD.

In the Sessler and Albert interview, Albert uses the term cache to describe eSRAM and it is not the first time. Dumbing it down is how I take it though.
 
GPU 3*4*(4*64KB Registers+16KB L1) + 3 * 48KB K$/I$ + 4*128kb L2 is about 4MB

which leaves 5MB undiscovered land. Any input here?
+ 12 * 8kB scalar registers
+ 12 * 64kB LDS
+ 4 * (4kB+16kB) ROP caches

And I remember someone claiming the eSRAM is using ECC, which would be 32 MB * 9/8 = 36MB with some creative counting.

All in all roughly half of the transistors (not area) is probably SRAM.
 
Didn´t they say "internal storage" counting even Flash?

My bad 47 mb not gb XD

Anyway, what would be the point using ECC in a consumer level device??
 
Anyway, what would be the point using ECC in a consumer level device??
One reason on-die ECC has been migrating downwards as the base designs have been made to reach into new markets like servers or HPC. The base designs of the major units would carry it forward.

The other is that AMD and Microsoft should have some base error rate for the SRAM, which informs other things like the long-term reliability of the device, yields, and the voltages it can operate at.
Shrinks lead to increasingly smaller and leakier transistors, lower voltages, and worsening variability, which means that SRAM is far more vulnerable to errors and degradation as geometries shrink.

Even if consumer chips have a low bar for reliability, the trends are such that reaching it while balancing all the other design goals is not as straightforward as it once was.
 
One reason on-die ECC has been migrating downwards as the base designs have been made to reach into new markets like servers or HPC. The base designs of the major units would carry it forward.

The other is that AMD and Microsoft should have some base error rate for the SRAM, which informs other things like the long-term reliability of the device, yields, and the voltages it can operate at.
Shrinks lead to increasingly smaller and leakier transistors, lower voltages, and worsening variability, which means that SRAM is far more vulnerable to errors and degradation as geometries shrink.

Even if consumer chips have a low bar for reliability, the trends are such that reaching it while balancing all the other design goals is not as straightforward as it once was.

Many thanks, really informative

Now a crazy idea, ecc at sram level, would mean extra transistors, well
Could they be used as redundant memory?

- 100% working esram, ecc on
- Some defects (not larger than extra sram for ecc) ecc off.

Second durango would be more prone to memory errors, but fully functional
 
Now a crazy idea, ecc at sram level, would mean extra transistors, well
Could they be used as redundant memory?

- 100% working esram, ecc on
- Some defects (not larger than extra sram for ecc) ecc off.

Second durango would be more prone to memory errors, but fully functional

ECC bits can be repurposed for other things. For example, AMD chips like the K8 have ECC for the L1 data cache and parity for the instruction cache. The L1I stores branch selection bits aswell.
When instructions are evicted to the ECC-protected L2, the ECC bits are used to keep track of branch data from the L1, allowing for that branch history to be maintained.

This works because the cache access logic knows the specific bits that can be treated differently, and the logic is wired to not treat the wonky L2 ECC bits as such if it goes into the instruction cache.

Using the ECC bits as redundant bits in the event of a bit flaw would be more complicated. You'd have to keep a table of what bits are remapped that the access logic would need to consult on the fly. The logic would access the array, check the remapping, then shift around and insert the spare bit into the final result. The same would have to happen when writing.

Typically, there are spare lines that can be set at manufacturing for use. This keeps the actual access work consistent and straightforward.
 
Last edited by a moderator:
Didn´t they say "internal storage" counting even Flash?

My bad 47 mb not gb XD

Anyway, what would be the point using ECC in a consumer level device??

Uhm, 8 GIGS of flash would be a metric Fuckton more than the unaccounted for 4 Megs.
 
what modifications did amd/ms do to the jaguar cpus aside from the 4mb of cache? it does get 30gb/s of bandwidth correct?

if someone could explain one thing to me. if the cpu gets 30gb/s of bandwidth that leaves less than 38gb/s for everything else including the dmas feeding data to the gpu. will that ever pose a problem?
 
Status
Not open for further replies.
Back
Top