Xbox One (Durango) Technical hardware investigation

Status
Not open for further replies.
The difference between 28 cycles and 32 cycles is 15%. Is that small enough to be insignificant?
...And the difference between 1 and 2 is 100%, surely that's an enormous difference! ;) Diff between 28 and 32 is just 4 cycles. Computers deal with cycles, not percent. Percentages is a tool humans use, mainly to make certain numbers and mathematical illustrations easier to grasp for our limited minds. Sometimes percentages play tricks on us and make differences look bigger than they really are. 4 cycles@800MHz is what, 5ns? Not all that much really.
 
...And the difference between 1 and 2 is 100%, surely that's an enormous difference! ;) Diff between 28 and 32 is just 4 cycles. Computers deal with cycles, not percent. Percentages is a tool humans use, mainly to make certain numbers and mathematical illustrations easier to grasp for our limited minds. Sometimes percentages play tricks on us and make differences look bigger than they really are. 4 cycles@800MHz is what, 5ns? Not all that much really.

And that 100% difference can be huge if maintained over millions of cycles. A 1.6 ghz processor will complete 2 cycles in the time a 800 MHz processor will complete one.

28 vs 32 represents 25 million accesses to memory versus 28.5 million accesses per second.

But its still my question is that enough to significantly impact performance?
 
Last edited by a moderator:
But its still my question is that enough to significantly impact performance?
Of course it will, depending on how you define "significantly". It's a completely subjective slash arbitrary. Now weigh in the fact that an SRAM array is several hundred percent larger than the equivalent DRAM - is that still worth 15% performance (I point out that's numbers just pulled out of the air btw, but I'll ride with what we've been using so far, for consistency's sake). I would say that for MS, probably not.

50% greater performance, MAYBE that would be worth the cost. But 15%? No, I don't really see that happening. Of course, I could be wrong, but seeing as only the highest-of highest end CPUs - big-iron server chips - have on the order of 32MB of pure SRAM on-die I don't see it happening in a consumer device. That would not only be unprescidented, it would be revolutionary.
 
Saw this on GAF. Does the bolded hint at anything perhaps? :?:

For those interested in the technology behind Infiltrator's impressive demonstration, here’s a brief overview of the features on show:

• New material layering system, which provides unprecedented detail on characters and objects
• Dynamically lit particles, which can emit and receive light
• High-quality temporal anti-aliasing, eliminating jagged edges and temporal aliasing
• Thousands of dynamic lights with tiled deferred shading
• Adaptive detail levels with artist-programmable tessellation and displacement
• Millions of particles colliding with the environment using GPU simulation
• Realistic destructibles, such as walls and floors
• Physically based materials, lighting and shading
• Full-scene High Dynamic Range reflections with support for varying glossiness
• Illuminating Engineering Society (IES) profiles for realistic lighting distributions
 
I really doubt that MS has been going around calling it eSRAM when it's really eDRAM. Yeah I know about 1T-SRAM, but that's a pretty defunct brand name. More than that, it really doesn't make sense to designate it for embedded memory; the reason they're called PSRAMS (pseudo-SRAMs) is because they have an external interface that's identical to SRAM. They can be used with devices that normally only support SRAM, by putting the DRAM controller and some buffering logic on the memory die to hide this. If you're talking about embedded memory then there's no external interface to begin with so it doesn't really make sense.

Yeah 32MB of SRAM on the die is a lot but TSMC's 28nm process is pretty dense. By all indications a few times denser than IBM's 45nm process, so I don't think a comparison to those big iron Power CPUs is really on the level. I think it's realistic for a die size that makes sense for MS. Expensive, yes, a big tradeoff, but realistic.

But frankly I don't think they were motivated by performance on this one. Oh, they'll try to spin it that way (AFAIK they already are in their documentation) but I think it was entirely down to practicality. eDRAM is getting more expensive to implement and harder to find fabs that are able to do it on their most cutting edge process. Assuming that MS is keeping the standard console model with this one they'll want something that scales predictably in cost reduction over several years, and is guaranteed to be something they can easily have manufactured. It may well currently be more expensive to use 32MB of eSRAM than eDRAM (or not, I really don't know) but that doesn't necessarily make it true a node, or two, or three ahead. I'm sure there's at least a cutoff point where it doesn't make sense today, for instance I doubt you'd put down 2MB of eDRAM on a current chip. Consider that MS never integrated XBox 360's eDRAM die onto another chip, it may have never made sense for whatever reason - and if that's still the case then it's not hard to see how it's a loss.

Now if we were talking about something where there's value in increasing the amount over time then that could change the dynamic, but I don't think that's what we're talking about.
 
A number of modern game renderers do some kind of multiple pass rendering that defers shading and use tiling. It's not a reflection of the GPU the system might have.

Ah gotchya. Has EPIC used that in the past? My loose conjecture is that MS canvassed the top engine makers like Arthur said, and designed their hardware around the trends in engine design. As such, I'm curious what these trends are and if they really have any relation to tile based rendering. EPIC was the only major engine designer left that hadn't talked up virtual textures or virtual geometry in some form. Until I saw that today.

Would SVO's work well with a tile based setup? I know id and EPIC both were wanting to utilize them in different ways next gen but at least for EPIC that is up in the air for the time being.
 
A number of modern game renderers do some kind of multiple pass rendering that defers shading and use tiling. It's not a reflection of the GPU the system might have.

frostbite 2 comes to mind.


The last 10 pages have been nothing but a pain to read even for me, I don't know how you other guys put up with all the quick there's a word google it. Also with all this memory access talk if it was that important and provided that much improvement regardless of D ram or S ram wouldn't they just clock it to the speed needed to hit there performance target ;).
 
I really doubt that MS has been going around calling it eSRAM when it's really eDRAM. Yeah I know about 1T-SRAM, but that's a pretty defunct brand name.

I think you're right on the manufacturability going forward. Even if we're dealing with 60-80mm for the SRAM and ~300mm for the APU, after two nodes we're probably looking at a die that's under 100mm that's really cheap to manufacture. They're seem to be focus not on cheapest now, but cheapest when they need the BOM to hit under $99.

I also found this interesting. This is from the leaked 3~ year old presentation . It explicitly calls out eSRAM as an option over eDRAM. If the eSRAM was just eDRAM, I'm not sure it would be on this slide.

Slide10.jpg
 
frostbite 2 comes to mind.


The last 10 pages have been nothing but a pain to read even for me, I don't know how you other guys put up with all the quick there's a word google it. Also with all this memory access talk if it was that important and provided that much improvement regardless of D ram or S ram wouldn't they just clock it to the speed needed to hit there performance target ;).

Because clocking things to a certain speed just isn't practical if you want enough of them available to sell to customers, I think. The yields may not be so good, so I suppose it's much easier to just try your best to maximize what you get out of a much safer clock speed.
 
And what is your source for this? MS's own patents are pretty explicitly suggesting otherwise and they explain a lot about the stuff we do know by the sound of it. Maybe the 'deferred' part is what's off?


They also have patents for ray tracing hardware, maybe that's in the console too...

I've asked if there was any truth to speculation that Durango is designed for TBDR and have been told it's 'wishful thinking'.
 
Last edited by a moderator:
Yeah 32MB of SRAM on the die is a lot but TSMC's 28nm process is pretty dense.
Doesn't matter how dense it is, SRAM will still be proportionally larger compared to DRAM by basically the same factor. If there's really 32MB bona fide SRAM on that die it's going to eat up upwards of 40, maybe 50% die space. It'll be major, for sure. Hard to see how a fairly minor performance difference could be worth such a big investment in silicon. Bragging rights alone don't carry you very far with the ignorant populace out there - nobody's gonna care other than the fanboys, and they're already sold on your shit anyway so that's not a win.

eDRAM is getting more expensive to implement and harder to find fabs that are able to do it on their most cutting edge process.
If that would really be their excuse/motivation for going with far, far larger SRAM then they seriously need to think about why they think they need the on-chip memory in the first place methinks.

32MB SRAM really, really is a really really huge amount of memory. A quadcore i7 only has 9MB SRAM (excluding the GPU-only LLC of ivy bridge, which I don't know the capacity of), but it's still roughly half the die give or take a bit. Stepping up to 32, that's...huge. HUGE. There's so much logic they could have sunk into the chip with that much space. 32MB *8 bits per byte *6 trannies/bit, that's a billion and a half just for SRAM arrays. Not factoring in anything else, redundancy (if applicable), control lines and attached logic for handling access conflicts, snooping, resolves, DMA and stuff. All that would weigh out to even more in total.

I'd be fucking super duper amazed if they'd actually do this. Really really big "if".
 
Last edited by a moderator:
Doesn't matter how dense it is, SRAM will still be proportionally larger compared to DRAM by basically the same factor. If there's really 32MB bona fide SRAM on that die it's going to eat up upwards of 40, maybe 50% die space. It'll be major, for sure. Hard to see how a fairly minor performance difference could be worth such a big investment in silicon. Bragging rights alone don't carry you very far with the ignorant populace out there - nobody's gonna care other than the fanboys, and they're already sold on your shit anyway so that's not a win.

Your math is off. Even at the same density as the GPU, 32MB or eSRAM wouldn't be 40% of the die and it will be much more dense, probably double or better.
 
Thanks for the response, I was just curious because it seems to be gaining traction in some of the other outlets. Most if not all this stuff is kind of over my head but I enjoy reading it since the systems are not out to play yet :)
 
Because clocking things to a certain speed just isn't practical if you want enough of them available to sell to customers, I think. The yields may not be so good, so I suppose it's much easier to just try your best to maximize what you get out of a much safer clock speed.

You are talking about 15% access time difference, if it bought the console that much performance then they would clock it 15% higher and pay the price ( whatever that is, yields, reduced clocks on other components etc) . The reality is it probably makes very little difference, it's already way faster then main memory and proves more throughput, that is the important bit.
 
Doesn't matter how dense it is, SRAM will still be proportionally larger compared to DRAM by basically the same factor. If there's really 32MB bona fide SRAM on that die it's going to eat up upwards of 40, maybe 50% die space. It'll be major, for sure. Hard to see how a fairly minor performance difference could be worth such a big investment in silicon. Bragging rights alone don't carry you very far with the ignorant populace out there - nobody's gonna care other than the fanboys, and they're already sold on your shit anyway so that's not a win.


If that would really be their excuse/motivation for going with far, far larger SRAM then they seriously need to think about why they think they need the on-chip memory in the first place methinks.

32MB SRAM really, really is a really really huge amount of memory. A quadcore i7 only has 9MB SRAM (excluding the GPU-only LLC of ivy bridge, which I don't know the capacity of), but it's still roughly half the die give or take a bit. Stepping up to 32, that's...huge. HUGE. There's so much logic they could have sunk into the chip with that much space. 32MB *8 bits per byte *6 trannies/bit, that's a billion and a half just for SRAM arrays. Not factoring in anything else, redundancy (if applicable), control lines and attached logic for handling access conflicts, snooping, resolves, DMA and stuff. All that would weigh out to even more in total.

I'd be fucking super duper amazed if they'd actually do this. Really really big "if".


You really think it's so inconceivable that Microsoft would do this, and that it would truly only have a minor performance difference if they did? Just thinking about what it means in an APU design, it seems like 32MB of legit SRAM is a pretty damn good move. It's like having L3 cache onboard, only the L3 cache is dedicated strictly to the GPU and its memory clients.

And when you say there will only be a minor performance difference, might you not be overlooking the fact that Microsoft isn't expecting it to carry the entire graphical load, or suddenly make a 1.2 TFLOP GPU perform like a 2 TFLOP part, but to just make certain crucial tasks much faster and cheaper, which when a dev takes a step back and looks at their overall efficiency gains may find it well worth it? I think it's hardly a matter of fanboys and their bragging rights, more than it's simply discussing one of the more interesting aspects of the console's design and how, if at all, this could somehow be beneficial to games on Durango.

This isn't anyone saying this magically makes it stronger than or equal to Sony's machine. And, to be honest, I don't think it really matters, because it will have little to no bearing on whether or not the games themselves are good. Durango will be more than sufficiently powerful to produce incredible looking games. I suspect that, along the way, developers will find there are some useful development benefits to the console's ESRAM, same as they did with the 360's EDRAM. It just made certain things less of a bottleneck to overall performance and developers felt it was useful. The ESRAM carries that same potential, but possibly more due to the increased versatility of how devs can utilize it.

That semiaccurate forum post is all kinds of insane, and not in a good way. :)

You are talking about 15% access time difference, if it bought the console that much performance then they would clock it 15% higher and pay the price ( whatever that is, yields, reduced clocks on other components etc) . The reality is it probably makes very little difference, it's already way faster then main memory and proves more throughput, that is the important bit.

Isn't it more than just this, though? ERP suggested that if it was real SRAM and similar to L2 cache performance, a cache miss would drop from 300+ GPU cycles to 10-20 cycles. He also said a shader spends more time waiting on memory than computing values, and if that's truly the case, why wouldn't the SRAM potentially be pretty helpful for Durango development?
 
Last edited by a moderator:
Isn't it more than just this, though? ERP suggested that if it was real SRAM and similar to L2 cache performance, a cache miss would drop from 300+ GPU cycles to 10-20 cycles. He also said a shader spends more time waiting on memory than computing values, and if that's truly the case, why wouldn't the SRAM potentially be pretty helpful for Durango development?

Now your talking about something completely different from what the physical array is, i.e. some form of S ram or some form of D ram. Now you are talking about a memory hierarchy. Everyone has been talking from a "scratch pad" perspective, which means its not a cache, so you would only get a benefit for an L2 miss if the data is being held on the 32meg *ram block. At this point 28 or 32 cycles, if its S ram or D ram doesn't make a huge difference because either way your saving big against the access latency of the main DDR 3 memory.

If there is an L2 miss and the data is held on the DDR3 the *ram block wont help you, its not a cache.
 
Status
Not open for further replies.
Back
Top