Xbox One (Durango) Technical hardware investigation

pMax · Feb 19, 2014

3dilettante said:
I would grant AMD's CPUs with enough capability as to handle two VM trips every 16 or 30ms

You can have much more... Average cost of a VMExit-equivalent should be around 1k cycles, probably less nowadays (but you have to re-enter, also). It is likely that some interrupt cant be virtualized in the PlatformOS but needs VMM.
Yet, even if you lose 2k cycles on an interrupt, you SHOULD be able to get more than one or two, given you have at least 6-8 processors running - an interrupt doesnt block all the cores!

3dilettante said:
but I don't count on the GPU being on the same level.

for sure, but it's not matter of GPU - CPUs interrupt are virtualized.

3dilettante said:
We actually do have GPU latencies for various front-end functions and queueing delays under load (PS4, but still) and I was surprised how bad they could be (30 or so ms).

...just curious, is it due to the fact those jobs have lower priority? isnt in SI some internal priority in queues/buffers that dictates from where the next job will be picked?
30ms looks very big, especially considered that SI should have 8x8 queues.

3dilettante · Feb 19, 2014

pMax said:
...just curious, is it due to the fact those jobs have lower priority? isnt in SI some internal priority in queues/buffers that dictates from where the next job will be picked?
30ms looks very big, especially considered that SI should have 8x8 queues.

This was from the audio engine developer, so the desired applications were very latency sensitive and should have had high priority. At least at this early juncture, he did not believe in using the GPU for anything that would not tolerate multiple frames of delay.

Under light load, the delays are not significant, but when the system is under load the latencies become very long. If it isn't the queues themselves, it is shader engines not being able to free up a wavefront slot in a timely fashion.

pMax · Feb 19, 2014

3dilettante said:
If it isn't the queues themselves, it is shader engines not being able to free up a wavefront slot in a timely fashion.

Yep, I was thinking this as second option. Didnt it work to make a limited number of kernels, put them in a batch loop with S_SLEEP and wake them up on need?

OT, I do wonder then what will happen in xbone when the feat that allows you to send controlled messages(interrupts) from GPU to CPU will start to be used... just imagine your kernels asking the CPU to do some job... in xbone strangely virtualized model :O

GravityX · Feb 20, 2014

Speculation.

Do these AMD tags mean anything to you guys?

Latest AMD APU are designated A10. These are marked A12.

Plus if you look closely, you see two designations A12 41 and A12 40.

Lalaland · Feb 20, 2014

GravityX said:
Speculation.

Do these AMD tags mean anything to you guys?

Latest AMD APU are designated A10. These are marked A12.

Plus if you look closely, you see two designations A12 41 and A12 40.

Not much and certainly not what misterxmedia has been spinning. The design is likely designated A12 as it isn't an A10 design because of the integrated GPU and ESRAM but they are Jaguar cores. What it most certainly is not is a stacked die or stealth dGPU.

taisui · Feb 20, 2014

Oh the You-Know-Who has returned...?

taisui · Feb 21, 2014

delete

Jwm · Feb 21, 2014

Just traces seen on the underside of the wafer correct? Plus this is way out on the edge of the chip in a corner I think. Here are some shots of the same look on a 486 chip, scroll to the bottom.

Or look at out the edge of the Wii-U.

Edit: Do you see the same effect in this image?

I count about nine stacked layers, so there must be some serious dGPU going on in that! /sarcasm

Pixel · Feb 21, 2014

GravityX said:
Speculation.

Do these AMD tags mean anything to you guys?

Latest AMD APU are designated A10. These are marked A12.

Plus if you look closely, you see two designations A12 41 and A12 40.

All complex micoprocessors & ram consist of multiple 'metal layers' of interconnect (wiring) which connect the transistor on the transistor layer with one another. Even simple integrated circuits from the 1970s consisted of a metal inteconnect layer. As as transistor density increased by the early 80s you started seeing chips with 2 metal layers. In that photo the layers of goldish materiel are the multiple 'metal layers' of copper interconnect (wiring). AMD's jaguar cpus in their APU line have 11 metal layers. The translucent green you see underneath the top metal layer are multiple layers of translucent low dielectric constants material such as forms of silicon-carbide which traditionally were forms of silicate glass. They sandwich and separate the transistor layer and the various metal interconnect layers.

Jwm · Feb 21, 2014

Thanks Pixel, I had found some perfect pictures to illustrate what you just said.

Rangers · Feb 22, 2014

Hmm, I just became aware there is an AMD card with the exact configuration of Xbox One GPU. R7 260, 768 shaders, 16 ROPs etc. Previously I had thought there was not an exact match. It seems to be a rare card but newegg has one for $109.

It has 96GB/s of bandwidth. I'm guessing that's fairly analogous to One as well. It'll have more baseline BW but less peak, might incredibly roughly equal out.

The only "problem" is it's 1ghz core clocked. But one could underclock it.

Sadly, I kind of want to buy one, underclock it to ~853 core, and run PC benches on it. It's only $109. Heck after being content, one could ebay it for a chunk of money back, or return it with 30 days and only be out 15% restock fees, etc...

blakjedi · Feb 22, 2014

Rangers said:
Hmm, I just became aware there is an AMD card with the exact configuration of Xbox One GPU. R7 260, 768 shaders, 16 ROPs etc. Previously I had thought there was not an exact match. It seems to be a rare card but newegg has one for $109.

It has 96GB/s of bandwidth. I'm guessing that's fairly analogous to One as well. It'll have more baseline BW but less peak, might incredibly roughly equal out.

The only "problem" is it's 1ghz core clocked. But one could underclock it.

Sadly, I kind of want to buy one, underclock it to ~853 core, and run PC benches on it. It's only $109. Heck after being content, one could ebay it for a chunk of money back, or return it with 30 days and only be out 15% restock fees, etc...

Esram really mashed up xboxs design

DrJay24 · Feb 22, 2014

Rangers said:
Hmm, I just became aware there is an AMD card with the exact configuration of Xbox One GPU. R7 260, 768 shaders, 16 ROPs etc. Previously I had thought there was not an exact match. It seems to be a rare card but newegg has one for $109.

It has 96GB/s of bandwidth. I'm guessing that's fairly analogous to One as well. It'll have more baseline BW but less peak, might incredibly roughly equal out.

The only "problem" is it's 1ghz core clocked. But one could underclock it.

Sadly, I kind of want to buy one, underclock it to ~853 core, and run PC benches on it. It's only $109. Heck after being content, one could ebay it for a chunk of money back, or return it with 30 days and only be out 15% restock fees, etc...

Wayyyy back I estimated the effective XB1 bandwidth based on the specs and came to about 100-110GB/s I think. AMD are not dummies, they are not going to create a memory bottleneck so they match the the memory to the CUs, ROPS, etc.

taisui · Feb 22, 2014

zupallinere said:
Knowing what is coming up and in what order is not helpful ?? Sure a polygon is a polygon but being able to predict with in some cases near certainty what data access patterns you will need must be of some value. Games aren't made with one big array of of data that is accessed randomly so knowing what is going to come next can't help but give you better performance. With the ESRAM being fast in access but limited in size it would be effective in getting the resources needed in and out and in a particular order.

Yes and no. "knowing what to load" a la "streaming" is commonly done now. Given the high bandwidth of the esram, it would make sense to have buffer that are accessed repeated stored in the esram. Tiling cost cycles and bandwidth, given the size it would be natural to use it for the framebuffers. You can stage other assets in the esram, I'm just not convinced that it'll actually work better.

(((interference))) · Feb 24, 2014

From the ESRAM thread...

Inuhanyou said:
We're talking about the smart choice of using ESRAM here. And the fact of the matter is the choice between the two competitor's solutions is plain to see because of the scratchpad being too small in size.

If course even if we were talking about CPU's you'd still be incorrect, because we know now by certain benchmarking tools that PS4's CPU is able to be exploited more fully, possibly because more cores are unlocked for games. And Shape is dedicated to Kinect processing, it has nothing to do with "offloading from CPU", your not using that for actual game cycles.

Both systems reserve 2 CPU cores for the OS, that had been well established last year and we haven't heard any new information suggesting that has changed.

And that SHAPE can save CPU time is a clear fact.

SHAPE offloads >1 CPU core

http://www.examiner.com/article/xbo...e-benchmarks-revealed-at-hot-chips-conference

Bkilian has gone into great detail on this (and he would know since he worked on it)

bkilian said:
I can't speak to what the PS4 has, but the X1 audio block would put the best sound card you can buy to shame. And that's _before_ you add in _any_ of the DSP cores. And the DSP core for scheduling removes a huge burden from the CPU requirements for audio processing. If all the chip did was offload effects scheduling and mixing it would easily half the CPU requirements for audio compared to the 360. It does a lot more than that.

But as I've said before, it helps with CPU processing and will not perform any magical GPU upgrade. It just means games on the X1 will have more CPU headroom to either use for reducing the amount of time it takes to get the game to a happy place CPU wise, or increase the amount of CPU tasks being done. The realist in me suspects it'll be the former. When given the choice of "We can have a lower development cost" versus "We can fit in a few more AI tasks", any game company that is not a first party is going to choose lower costs.

I haven't seen these CPU benchmarks, were they browser tests?

Brad Grenz · Feb 24, 2014

It was a benchmark for the Substance Engine middleware that showed a single CPU core in the PS4 is faster than a single core in the Xbox One. The implication is that the PS4 CPU is actually clocked higher than the Xbox One's, or the virtualization penalty for the Xbox One's OS is severe.

(((interference))) · Feb 24, 2014

Brad Grenz said:
It was a benchmark for the Substance Engine middleware that showed a single CPU core in the PS4 is faster than a single core in the Xbox One. The implication is that the PS4 CPU is actually clocked higher than the Xbox One's, or the virtualization penalty for the Xbox One's OS is severe.

Hmm, I wonder how reliable that benchmark is at gauging relative differences in CPU power, do the results for other chipsets (eg Tegra, iPhone, Intel's Core line) accord with their relative standings on paper and in other benchmarks?

As if it is indeed a virtualization penalty then it seems rather high - 22% (assuming that without the VM structure, an XB1 core would match a PS4 core at 14 MB/s, then increasing it by 9.3% to factor in the higher clock on the XB1 side and comparing that with the actual result of 12 MB/s to figure out the extra overhead due to virtualization).

And if the PS4 CPU is clocked higher than XB1, going by the relative difference in the benchmark the clock would be more than 2 Ghz. This would requires a huge increase in TDP:

Looking at Kabini, we have a good idea of the dynamic range for Jaguar on TSMC’s 28nm process: 1GHz - 2GHz. Right around 1.6GHz seems to be the sweet spot, as going to 2GHz requires a 66% increase in TDP.

http://www.anandtech.com/show/6976/...wering-xbox-one-playstation-4-kabini-temash/4

djskribbles · Feb 24, 2014

And there's also this (multiplatform dev): http://www.neogaf.com/forum/showthread.php?p=94264594#post94264594

(((interference))) · Feb 24, 2014

I'm no GAF expert but I don't recall Matt being a reliable GAF insider, much less a dev.

He was one of the "yield issues and downclock" FUD crowd IIRC.

3dilettante · Feb 24, 2014

I don't believe the debate was ever really concluded as to what was going on with that benchmark, nor how it can be reconciled with this:

http://gamingbolt.com/new-benchmark-results-show-ps4s-gpu-is-faster-than-xbox-one-can-play-4k-video-frames-faster

(4 cores in use)

PC (2.8Ghz Core i5 with 4 cores and AMD R9 290x): CPU: 1.3 ms. GPU: 1.4 ms.
PS4: CPU 2.3 ms. GPU: 1.6 ms.
Xbox: CPU 2.3 ms. GPU: 2.3 ms.

Xbox One (Durango) Technical hardware investigation

pMax

3dilettante

pMax

GravityX

Lalaland

taisui

taisui

Jwm

Pixel

Jwm

Rangers

blakjedi

DrJay24

taisui

(((interference)))

Brad Grenz

Philosopher & Poet

(((interference)))

djskribbles

(((interference)))

3dilettante

Similar threads