Xbox One (Durango) Technical hardware investigation

Status
Not open for further replies.
Read the entire sentence, please. "Jaguar in the Xbox One" is not Jaguar in your PC (and on top of that calling GPU in Xbox One "Jaguar" is probably a bad idea to begin with). Non-CPU-cache-coherent GPU bandwidth on X1 is 68GBps.

Read the entire article again. The author only uses Jaguar to refer to the CPU.
Also the author is comparing it to Intel CPU bandwidth.

So do the math
 
Oh, okay. That doesn't sound like something I'd say. What's the difference between a software cache and a scratchpad? Does it have any implication in the framebuffer size?

A cache is handled by system logic 'automatically' a scratchpad is explicitly utilised and managed by the running program. It's a more complex thing to use and exploit to it's full potential, it's very unlike the eDRAM in Haswell which 'just works' as far as any PC programmer is concerned.
 
A cache is handled by system logic 'automatically' a scratchpad is explicitly utilised and managed by the running program. It's a more complex thing to use and exploit to it's full potential, it's very unlike the eDRAM in Haswell which 'just works' as far as any PC programmer is concerned.

Thank you for the explanation of the diference between the two

It makes sense for a console vs a pc, to get the full potential in XO you have to code for it, whereas in a pc it has to run more general code.

I remember from Crystalwell review that it could be partitioned to be used as an L3 from the cpu and the GPU, is it right?

Moreover, what kind of a cache is Crystalwell, is inclusive, exclusive??
Does its memory need to be replicated in the cache/ram hierarchy??
(My bad i wish i knew more :cry:, that´s why i read here)


Anyway, It brings the benefits of a cache, a fast and low latency memory close to its user (in this case the gpu) avoiding as possible, as much as it could costly ram accesses.

Well, time will tell the benefits of esram and how devs use it

tks
 
Actually it's on a image on the first link you posted:

http://www.extremetech.com/wp-content/uploads/2013/08/XBO_diagram_WM.jpg

There's an black arrow linking the cpu bus to the esram bus.

Edit: Oh, someone already posted in better quality.
 
The author only uses Jaguar to refer to the CPU.
I stand corrected on this. Slides however clearly indicate DRAM BW at 68GBps with CPU-cache-coherent BW limited to 30GBps.
http://pc.watch.impress.co.jp/img/pcw/docs/612/762/html/05.jpg.html
http://pc.watch.impress.co.jp/img/pcw/docs/612/762/html/10.jpg.html
Multicore x64 processors don't have to access memory in a cache coherent manner so it's not like CPU would be bound by this 30GBps BW.

So do the math
Thanks but no thanks. Even if you deem non-cache-coherent access useless and count only the 30GBps figure, difference between this and 51,2GBps is not +300%.

It's been suggested multiple times and I'll offer this advice once again: change your attitude, please. :)
 
Charlie theorizes that CPU maybe clocked around 1.9GHz. It kind of makes sense to me that the interface between memory and the CPU would be synchronous.

The CPUs connect to four 64b wide 2GB DDR3-2133 channels for a grand total of 68GB/sec bandwidth. Do note that this number exactly matches the width of a single on-die memory block. One interesting thing to note is that the speed of the CPU MMU’s coherent link to the DRAM controller is only 30GBps, something that strongly suggests that Microsoft sticks with Jaguar’s half-clock speed NB. If the NB to DRAM controller is 256b/32B wide, that would mean it runs at about 938MHz, 1.88GHz if it is 128b/16B wide.
SemiAccurate would be very surprised if it was 128b wide, wires are cheap, power saving areas not. Why is this important? Unless Microsoft’s XBox One architects are masochists that enjoy doing needless and annoying work they would not have reinvented the wheel and put an arbitrarily clockable asynchronous interface between the NB and the CPU cores/L2s. Added complexity, lowered performance, and die penalty for absolutely no useful upside is not a good architectural decision. That means the XBox One’s 8 Jaguar cores are clocked at ~1.9GHz, something that wasn’t announced at Hot Chips. Now you know.

Full article:

http://semiaccurate.com/2013/08/29/a-deep-dive-into-microsofts-xbox-ones-architecture/
 
The only problem with that theory is that there was some article on how the optimum heat/power ratio for that chip was at 1.6, where upping to 2.0 would mean a huge increase in inefficiency on that front. At least if I remember correctly. And 1.9GHz is very close to 2.0 ... ?
 
The only problem with that theory is that there was some article on how the optimum heat/power ratio for that chip was at 1.6, where upping to 2.0 would mean a huge increase in inefficiency on that front. At least if I remember correctly. And 1.9GHz is very close to 2.0 ... ?

That is true, but I would say that is true for Kabini only. Even though they share components, the physical designs wouldn't equivalent. The design could very well be synthesize for a different target frequency with different power characteristics.
 
The only problem with that theory is that there was some article on how the optimum heat/power ratio for that chip was at 1.6, where upping to 2.0 would mean a huge increase in inefficiency on that front. At least if I remember correctly. And 1.9GHz is very close to 2.0 ... ?

The percentage increase in power consumption was high to go to 2.0ghz from 1.6ghz but in the end the absolute number wasn't so offensive given each core uses something like 8 watts. So while the sweet spot for price/performance was 1.6ghz, maybe MS felt the extra 4w or so was worthwhile.
 
The only problem with that theory is that there was some article on how the optimum heat/power ratio for that chip was at 1.6, where upping to 2.0 would mean a huge increase in inefficiency on that front. At least if I remember correctly. And 1.9GHz is very close to 2.0 ... ?

If we were talking desktop i7s, it would probably be a problem. But jags are low powered hardware. 66% in TDP, when you talking moving from 15W to 25W, can be more readily accommodated.
 
Since the 66% power increase between 1.6 and 2.0 isn't linear, that efficiency drop could possibly show a much much smaller wattage increase at 1.88.
 
FWIW, if the clock is indeed to 1.88 ghz, the XBO would theoretically have 89 GFLOPS free after the 2 core OS reservation. This is compared to the 75 GFLOPS it would have had available at 1.6 ghz. If what we hear about the audio chip is true and leveraged properly, all of that 89 should be available for game-related functions.
 
The CPU clock is interesting news, because eastmen had said some weeks ago he had heard the One CPU was not clocked at 1.6, though he didn't know what the speed was (my guess was 1/2 the GPU clock=1.7 if anything).
 
BTW, Scott Wasson at Tech Report has his very interesting take on the hot chips One specs up

http://techreport.com/news/25288/xbox-one-soc-rivals-amd-tahiti-for-size-complexity

This chip's use of DDR3 system memory presents an intriguing contrast to the otherwise-similar SoC in the PlayStation 4. Sony chose to employ much higher bandwidth GDDR5 memory, instead. The Xbox One's architects sought to overcome this bandwidth disparity by putting quite a bit of fast eSRAM memory on the SoC die. This decision hearkens back to the Xbox 360's use of an external 10MB eDRAM chip, but it also participates in an emerging trend in high-performance SoC architectures.

Scott also makes a very undereported point IMO, one that I believe BeyondTed first brought to my attention, that on chip memory accesses are much more efficient from a power usage perspective than off chip ones. It could be important given these SOC's operate within strict power envelopes.

For instance, Intel incorporated 128MB of eDRAM onto the Haswell GT3e package to serve as an L4 cache, primarily for graphics, in order to overcome the bandwidth limitations of the CPU socket. The result is quite credible performance from an integrated graphics solution—and it comes in the context of a relatively modest 47W power budget, in part because local cache accesses have lower power costs than going out to main memory. The GT3e also happens to be a virtuoso in bandwidth-bound CPU workloads like computational fluid dynamics thanks to that massive L4 cache.

And this is the image he linked, 4950HQ=Haswell with 128 MB EDRAM

euler3d-max.png


First time I've seen a concrete example of local cache providing big gains in a CPU benchmark. I am not sure if fluid dynamics is really a killer app though.

To add to the impressiveness evidently 4950HQ is a 2.4 ghz chip that turbos to 3.6. One would think on a max threads bench it would be closer to 2.4, putting it at quite the clock disadvantage as well.
 
FWIW, if the clock is indeed to 1.88 ghz, the XBO would theoretically have 89 GFLOPS free after the 2 core OS reservation. This is compared to the 75 GFLOPS it would have had available at 1.6 ghz. If what we hear about the audio chip is true and leveraged properly, all of that 89 should be available for game-related functions.

Not to make this a VS related question - but how does this compare to what we know of the similar Jag setup in the PS4? Do we know what clock that is set at yet, etc?
 
The CPUs connect to four 64b wide 2GB DDR3-2133 channels for a grand total of 68GB/sec bandwidth. Do note that this number exactly matches the width of a single on-die memory block. One interesting thing to note is that the speed of the CPU MMU’s coherent link to the DRAM controller is only 30GBps, something that strongly suggests that Microsoft sticks with Jaguar’s half-clock speed NB.

I do not follow this math, can anyone translate what they are saying?
 
Scott also makes a very undereported point IMO, one that I believe BeyondTed first brought to my attention, that on chip memory accesses are much more efficient from a power usage perspective than off chip ones. It could be important given these SOC's operate within strict power envelopes.
I think you're talking about a few watts here, not 50-100W or anything that's of concern to a mains-powered product (maybe I'm wrong on that?). It'll be good for an i7 MacBook to get better performance per watt off battery than the same performance from a discrete GPU, but I don't see the real-world benefit for a traditional games console.
 
On a side note it would seem like XB1 SOC is (for now) the second largest chip produced in terms of transistors.

Since the only one I can think of higher is GK110 and variants. X1 has more transistors than Tahiti and any Nvidia chip besides GK110.
 
On a side note it would seem like XB1 SOC is (for now) the second largest chip produced in terms of transistors.

Since the only one I can think of higher is GK110 and variants. X1 has more transistors than Tahiti and any Nvidia chip besides GK110.
No. Please stop with the transistor count as some kind of important metric, it's a pointless exercise. Think about this:

An 8Gb DDR3 chip is somewhere between 8 and 9 billion transistors, and it costs under $5.

A 128Gb flash chip is 60 billion transistors. it's only 146mm2 and it costs under $10.
 
Status
Not open for further replies.
Back
Top