Xbox One (Durango) Technical hardware investigation

Status
Not open for further replies.
SemiAccurate would be very surprised if it was 128b wide, wires are cheap, power saving areas not. Why is this important? Unless Microsoft’s XBox One architects are masochists that enjoy doing needless and annoying work they would not have reinvented the wheel and put an arbitrarily clockable asynchronous interface between the NB and the CPU cores/L2s. Added complexity, lowered performance, and die penalty for absolutely no useful upside is not a good architectural decision. That means the XBox One’s 8 Jaguar cores are clocked at ~1.9GHz, something that wasn’t announced at Hot Chips.

thanks if anyone can decode this mastery madness

He's saying that because the coherent bus BW is 30GBps, working backwards doing the 256b/32B from the DDR3, the bus would be 938Mhz. Take that and times the multiplier 2X, you get ~1.9Ghz on the CPU.

However I don't see why you can't do 1.5X on the frequency multiplier or just have it be async.
 
He's saying that because the coherent bus BW is 30GBps, working backwards doing the 256b/32B from the DDR3, the bus would be 938Mhz. Take that and times the multiplier 2X, you get ~1.9Ghz on the CPU.

However I don't see why you can't do 1.5X on the frequency multiplier or just have it be async.

did they ever find out if it is sync or async? and when you say working backwards, how does the math apply there? and do you know what the B stands for?
 
I dunno if anyone is bias and I don't think here is the right place to talk about it . I just would like to see the theory of why its wrong and not just hear its wrong and that's it. I'm not a huge tech head like the guys on here and I do like to read what everyone comes up with and why even if I understand little of it. Its one of the only ways for me to learn more.

the theory of why its wrong is from the vgleak docs is that both ps4 and XBO have the same coherency number of 30gb's and the clock on the original doc is 1.6. Now you are saying you have an insider that claims the XBO has a more powerful CPU correct??
 
did they ever find out if it is sync or async? and when you say working backwards, how does the math apply there? and do you know what the B stands for?

B=Byte, b=bit, 1Byte=8bits, they are unit for data.
So 30GBps / 32B = 938M

He's guessing it's synchronous because it's a simpler design.
 
the theory of why its wrong is from the vgleak docs is that both ps4 and XBO have the same coherency number of 30gb's and the clock on the original doc is 1.6. Now you are saying you have an insider that claims the XBO has a more powerful CPU correct??

same person that told me about a possible small gpu uplock happening of about 75mhz. He told me that cpu in the xbox one is faster but never told me how much. I assume it was 1.8ghz and he never corrected me when I asked him the speed. So I dunno .
 
well you seem knowledgeable.. Do you think its true? or do you think the vg leaks are correct?

VGleaks also said the GPU was 800 MHz. And that was true until it wasn't anymore. If ms was able to up clock the GPU the possibility exists that it could happen on the CPU side as well.
 
The only problem with that theory is that there was some article on how the optimum heat/power ratio for that chip was at 1.6, where upping to 2.0 would mean a huge increase in inefficiency on that front. At least if I remember correctly. And 1.9GHz is very close to 2.0 ... ?

I'm guessing this is based upon Anand's remarks about the consoles idea clock speed? The heat concerns are unsubstantiated for a couple of reasons:

1. TDP != power consumption, especially when you're isolating the CPU from an SoC. Anand showed this himself in his Kabini review when he recorded the entire laptop drawing 11.5W under CPU load (the A4-5000 is rated at 15W).
2. The 15W -> 25W jump includes a GPU clock boost of 20%.
3. The Opteron X1150 (CPU-only Jaguar) shows otherwise. A 100% increase in CPU speed (1 GHz -> 2 GHz) results in only an 88% increase in TDP (9W -> 17W).
4. Jaguar consumes so little power that even if a 25% bump to 2 GHz in the XB1 resulted in a 50% increase in power, you're looking at an extra 10W at most.
 
They're using the 30 GB/s coherent link to calculate the clock rate of the CPU (30 GB/s / 32 byte bus width = 938 MHz), but if the clock rate of that link was tied to the CPU clock, then it would be higher than 30 GB/s, which is the same bandwidth they had listed on VGleaks at 800 MHz. Unless you believe the VGleaks stats were wrong about clock for the CPU from the start, I don't see how you can calculate the CPU clock in that manner.
 
well you seem knowledgeable.. Do you think its true? or do you think the vg leaks are correct?

It's an interesting guess, but the bus speed seem weird, so I don't know. I'm far more interested in knowing more about the eSRAM and the 2x2 gfx/compute cmd processors, which I am guessing would help with the CU utilization but I can't seen to find anything about it.
 
I have doubts the numbers given are precise enough to give a full picture.
One thing I notice from the Vgleaks Durango diagram is that the Nortbridge to CPU links have 20.8 GB/s.
There are a number of variables you can play with: bus width, whether the two links are separate, clock speed, etc.
What feasible combination of bit widths and clocks lead to 20.8 GB/s? 1.9 GHz or half of it doesn't give that number for any integer byte width.

The first seemingly reasonable numbers I could arrive at is 16B and 1.3 GHz, or possibly less reasonably 13B and 1.6 GHz.

Trying to get a good match for that bus with reasonable bus widths for coherent traffic, or for that matter the IO block's bus, makes me suspect that the various blocks have interfaces that run at different ratios. That doesn't say much either way about the CPU clocks, though.
 
Last edited by a moderator:
I think the Microsoft presentation and the leaked documents on Durango have gaps in what they discuss that leaves room for interpretation.

There is a link or links between the CPU clusters and the Northbridge. Vgleaks shows 20.8 in both directions from a Jaguar cluster, with both clusters getting their own pair of links.
Adding those up already exceeds memory bandwidth and the bandwidth of the Northbridge's coherent traffic.
That could mean that there is an ambiguity in the diagram as to whether those directions should be listed separately, although the Semiaccurate article also brings up the idea that the interface between the two L2s has been buffed up, so their higher numbers may assume inter-module sharing.

If the request queue for coherent traffic is similar to existing AMD chips, there is a central crossbar and queue all ordered memory clients plug into, which is the likely source of the 30GB/s limit.
Coherent traffic from the CPUs needs to go through that juncture.
Write-combining traffic that doesn't go through the caches might supply write bandwidth that goes beyond the initial 30 GB/s.
 
I think the Microsoft presentation and the leaked documents on Durango have gaps in what they discuss that leaves room for interpretation.

There is a link or links between the CPU clusters and the Northbridge. Vgleaks shows 20.8 in both directions from a Jaguar cluster, with both clusters getting their own pair of links.
Adding those up already exceeds memory bandwidth and the bandwidth of the Northbridge's coherent traffic.
That could mean that there is an ambiguity in the diagram as to whether those directions should be listed separately, although the Semiaccurate article also brings up the idea that the interface between the two L2s has been buffed up, so their higher numbers may assume inter-module sharing.

If the request queue for coherent traffic is similar to existing AMD chips, there is a central crossbar and queue all ordered memory clients plug into, which is the likely source of the 30GB/s limit.
Coherent traffic from the CPUs needs to go through that juncture.
Write-combining traffic that doesn't go through the caches might supply write bandwidth that goes beyond the initial 30 GB/s.

Thanks for the insight!
I really hope that somebody can get his hands on real documentation soon.
I have the PS2 Linux kit, which has all the developer documentation supplied, it has specs for every single part, bus, and even real-life usage scenarios so it's not limited to theoretical specs.
 
30 GB/s / 32 byte bus width = 938 MHz
I think it'd actually be more: 20 * 1024^3 / 32 = 1006. But regardless of math - memory interface frequency has to play nicely with memory frequency but doesn't have to build any "sane" relationship with CPU speed. Last two pages of this thread are tea leaves reading and its finest.
 
Another point against assuming a simple relationship between peak CPU clock and the northbridge is that both the GPU and CPUs are capable of ramping their clocks up and down, we've seen them tweak the GPU without propagating that change through the chip, and northbridge clocks have ways of being kept at least partly decoupled.

At the high level of the leaks so far, I think we can see several likely coarse clock domains--not counting the local domains within them. Disclosures about the complexity of other modern cores and power management make raising the specter of world-ending complexity because the APU doesn't use a 1:2 clock ratio sound a little excessive.
 

Another great reply, thanks. Do you mean that you can use it independently? Let's say you program a game from scratch -not from scratchpad :eek: , sorry for the bad joke- and don't want to use the DDR3 memory at all. If your game fits on the eSRAM, could you use it single-handedly to run it without DDR3?

Thanks glad I could help but your question has run into the limits of my knowledge! ;)

Whether a game could be run entirely from the ESRAM depends on how access to this pool is handled and whether there is coherency between the GPU and CPU caches (L1 and L2). To the first point I've seen some back and forth about whether the GPU has exclusive access to the ESRAM or if the CPU can access it but it forces a GPU cache flush (the second point and not good as it essentially stalls the GPU). The bigger obstacle is that it's only 32MB and most of the techniques that play to ESRAMs strengths have the framebuffer stored there leaving little room for the rest of your code.

I look forward to the more knowledgeable folk on this board correcting me soon! :D
 

I'm guessing this is based upon Anand's remarks about the consoles idea clock speed? The heat concerns are unsubstantiated for a couple of reasons:

1. TDP != power consumption, especially when you're isolating the CPU from an SoC. Anand showed this himself in his Kabini review when he recorded the entire laptop drawing 11.5W under CPU load (the A4-5000 is rated at 15W).
2. The 15W -> 25W jump includes a GPU clock boost of 20%.
3. The Opteron X1150 (CPU-only Jaguar) shows otherwise. A 100% increase in CPU speed (1 GHz -> 2 GHz) results in only an 88% increase in TDP (9W -> 17W).
4. Jaguar consumes so little power that even if a 25% bump to 2 GHz in the XB1 resulted in a 50% increase in power, you're looking at an extra 10W at most.

Anand also wrote that before he even tested Jaguar. I'd guess that the decision to limit the CPU to 1.6 GHz is for yield reasons, not due to concerns about Jaguar's power efficiency at 2 GHz.
 
Status
Not open for further replies.
Back
Top