esram astrophysics *spin-off*

Status
Not open for further replies.
Of course you can build something capable of multiple modes. But then you know of this capability before the silicon comes back from the fab, simply because it was designed that way.

Exactly. I have nothing technical to add to this discussion but general common sense tells me that if what Astrograd is saying were true then both Microsoft and AMD would have known it was a serious possibility at the design stage. And if there's a serious possibility that you can achieve a game changing double bandwidth then you'll be going after it with everything you've got from the start. You certainly wouldn't be surprised by the result.
 
Either it's possible, given the nature of small integrated circuits, for MS to have found an opportunity to read twice on the clocks as astrograd says, or its not as Gipsel says.
If final silicon came back with unexpected high timing margins, MS could have safely increased clocks. But driving data over some interface two times in a clock cycle requires that the interface need to support this, it has to be designed for this to work. That's nothing you discover after silicon comes back from the fab.
As I said before there are two possibilities in my opinion:
(i) The eSRAM was capable of the higher speed right from the beginning but because of some reason/miscommunication not the peak bandwidth was quoted but only a reduced number maybe valid for a certain set of circumstances like only access by the ROPs (which likely limits it to close to 128 Byte/clock) and no other bandwidth consumers. But if you do a DMA transfer simultaneously between the eSRAM and the DDR3, maybe the 25.6 GB/s can add to the ~109 GB/s (after upclock of the ROPs) or you can still do a few txture reads from the eSRAM in parallel because the actual bandwidth of the eSRAM is higher to begin with, not because of some double pumping MS just came up with as an adhoc solution after seeing huge timing margins in final silicon.
(ii) The unpreferable version would be a miscommunication now caused by some botched benchmarks and the real bandwidth is still 102.4 GB/s (or 109.2 after upclock).

What I can't believe is that AMD/MS didn't know what they do and completely unwantingly fluked themselves into getting not a broken SRAM controller in the process, but an SRAM array with higher performance than the theoretical peak given by the engineers before. I mean, SRAM isn't exactly the hardest part to build and there is lot of experience with it.
 
And if there's a serious possibility that you can achieve a game changing double bandwidth then you'll be going after it with everything you've got from the start. You certainly wouldn't be surprised by the result.
Part of Astrograd's argument is that we have only a single, third-hand editorial reference for that 'surprise'. We shouldn't place too much emphasis in the supposed BW increase coming as a complete surprise when formulating theories explaining the technical aspects of whether a BW increase is possible or not, given known bus width and clocks.
 
That it can't be a surprise and that the explanation given in that article for the bandwidth increase can't be true was part of my reasoning right from the beginning (should be also somewhere in the XB1 thread). I never questioned that the bandwidth may be higher than the 102.4 GB/s, I contested the given explanation.
 
Part of Astrograd's argument is that we have only a single, third-hand editorial reference for that 'surprise'. We shouldn't place too much emphasis in the supposed BW increase coming as a complete surprise when formulating theories explaining the technical aspects of whether a BW increase is possible or not, given known bus width and clocks.

If we can place so little trust and weight into the sources explanation of what happened,then why are we even discussing that it happened? it seems that he is arguing we shouldn't trust what they are saying. Or is he trying so simultaneously argue that the source is so bad it doesn't know its own information nor how to explain, but it is also trust worthy enough to believe? (this seems like a oxymoron).
 
Part of Astrograd's argument is that we have only a single, third-hand editorial reference for that 'surprise'. We shouldn't place too much emphasis in the supposed BW increase coming as a complete surprise when formulating theories explaining the technical aspects of whether a BW increase is possible or not, given known bus width and clocks.

Astrograd's argument has all the time been that you can take a chip manufactured with latches that only latch on one edge of the clock, and after discovering that the timing margins are good somehow magically changing those latches in the already manufactured chip into latches that latch at both edges of the clock (something that would require a physical change to the circuit, not change as in timing margin changes, but changes as in removing transistors, adding transistors, and change the metal wiring).

Of course, if they were designed for this from the start then that would be no problem. Nobody is arguing that the bandwidth can't be higher than the original figure. (That would be no problem at all to achieve.) Only that you can't change one circuit into a different circuit after it has been already manufactured through wishful thinking.
 
Given the source (DF has a track record of getting official info), I'm happy to accept that there is more BW than expected with the known bus size and clock.

DF said:
Well-placed development sources have told Digital Foundry that the ESRAM embedded memory within the Xbox One processor is considerably more capable than Microsoft envisaged during pre-production of the console, with data throughput levels up to 88 per cent higher in the final hardware.
We can take that as official. What's not official is the idea that the engineers were surprised.

However, with near-final production silicon, Microsoft techs have found that the hardware is capable of reading and writing simultaneously. Apparently, there are spare processing cycle "holes" that can be utilised for additional operations.
We know the maths behind the original BW calculations, so we know that any improvement in production silicon over pre-production plans isn't due to a change in bus width or frequency. The explanation of 'holes' could therefore point to something astrograd describes - if preproduction silicon planned for a given BW but engineers hoped to exploit a technique to extract more performance. If this is so, we'd like to hear an announcement "break-through memory interface extracts 88% more performance on SRAM scratchpads" followed by an IPC presentation, instead of the choice of language in the DF article that implies it was more of a happy accident. Without that, the only recourse is either to categorical disprove astrograd's theory as impossible and therefore debunk the whole article position, or to find some other explanation for exploiting 'holes', or to prove astrograd's theory as possible and entertain probabilities of this being some new exploitation of silicon to extract more performance.
 
Only that you can't change one circuit into a different circuit after it has been already manufactured through wishful thinking.
He's not using wishful thinking as justification, but quantum physics explanations. His theory can and should be discussed in that domain (in electronic engineering) in this thread.
 
He's not using wishful thinking as justification, but quantum physics explanations. His theory can and should be discussed in that domain (in electronic engineering) in this thread.
Really? Where does he explain how quantum physics can cause the physical configuration of transistors and wires to spontaneously change?
 
Really? Where does he explain how quantum physics can cause the physical configuration of transistors and wires to spontaneously change?
I don't know enough about chip physics or quantum physics to be a part of this conversation. If you understand everything astrograd is saying, argue with him how the quantum tunnelling effect cannot be used to register a second data pulse per clock. ;) Please do either partake in the conversation at the level astrograd and Gipsel et al have taken it, or just watch. "Wishful thinking" and "make believe" responses abound on the internet - it'd be nice to have one place where people actually discuss the theories and practicalities instead of the usual dismissing people as "so wrong it's not worth even talking about".
 
Really? Where does he explain how quantum physics can cause the physical configuration of transistors and wires to spontaneously change?
I don't know enough about chip physics or quantum physics to be a part of this conversation. If you understand everything astrograd is saying, argue with him how the quantum tunnelling effect cannot be used to register a second data pulse per clock. ;) Please do either partake in the conversation at the level astrograd and Gipsel et al have taken it, or just watch. "Wishful thinking" and "make believe" responses abound on the internet - it'd be nice to have one place where people actually discuss the theories and practicalities instead of the usual dismissing people as "so wrong it's not worth even talking about".
Okay. To spell out the answer to the rhetorical question Thowlly asked: No quantum mechanical effect can achieve this, no matter how hard one tries. The whole quantum mechanics discussion is merely a smoke screen and completely irrelevant to the question if some circuit can change its behaviour to something it is really designed not to do (not: "not designed to do" ;)).
And an interface not designed to transmit data twice a clock most certainly can't do it (as it usually requires more/different hardware effort to implement this). I would estimate the probability of this happening by a lucky accident (the hardware engineers didn't designed this capability, it just happens to work by some miracle) is likely almost as low as tunneling a tennis ball at 60mph through a 1 inch thick steel plate (common sense should tell everyone that this is pretty improbable, I guess nobody has seen this so far). As Thowlly implied, no quantum mechanics in the world change the physical implementation of a circuit after it left the fab.
 
Last edited by a moderator:
First, good job Shifty on trying to keep this thread in order, always appreciated!

Secondly, I also (like someone else mentioned) is in awe of people who knows maths at high level, those people are admired by me :)

Thirdly, and post related...
I don´t what to negate all the interesting perspectives we have gotten from astrograd or gipsel or other.. but perhaps we should look at the Occams razor answer here?

What is the simplest explanation on how MS got the "bumped" up performance of the ESRAM?

So question would be, who/what/which design/produced ESRAM for MS?
Can it be something along the lines of.. MS tech-guys thought they would get x performance but AMD (or whoever the partner was) knew that they could do more with it and made the changes when giving samples back to MS and while testing, they found out that they had more performance than x?

(Ok, this is perhaps not the Occams razor answer, but somewhere, the "surprised" comments have to be explained. Or, it is just that the DF used bad wording and that MS was really not surprised but they did made a change on samples that they hoped for would give them more speed and samples came back positive..)

I remember Phil Spencer saying that MS went in and customized the tiniest piece of silicon in the machine to optimize performance..

anyways, fascinating thread! :)
 
I would think the most probable explanation would be some friction losses in a large company like MS on such a large project as the XB1 with internal and external (AMD) engineers putting together a complex system. I'm pretty sure the hardware guys know their stuff and were always aware of the possibilities of the hardware. You simply don't double your bandwidth by accident. But as the documentation for devs got written (probably by some developer relations guys with input and help from the hardware design people) that information may have been partially lost. Or they only looked at some specific scenario to arrive at that number. The vgleaks docs show that important changes are often given in comparison to the XB360. And there, the embedded RAM could only be accessed by the ROPs. So maybe they gave just the bandwidth usable by the color backends of the ROPs without saying this explicitly. Who knows? At least this scenario looks conceivable to me and doesn't involve extremely lucky accidents or some extreme conjectures about their circuit design.
 
I would think the most probable explanation would be some friction losses in a large company like MS on such a large project as the XB1 with internal and external (AMD) engineers putting together a complex system.

Along with issues such a security and secrecy adding variables to this already complex equation. ( not a real equation despite the OP ).
 
Back to the beginning:

You design all 8 cycles to double pump if that's your goal, sure. But this wasn't what we got here. If my theory on what happened is right, I don't see anything particularly shocking about it. Just good luck on MS's part. It could have very well been something that only popped up as they were finalizing their production testing.

For instance, you can get faster state changes in the transistors by making them smaller due to the way quantum tunneling works. Maybe they got the eSRAM smaller, more dense during some production test runs and didn't realize it was small enough to have the states be capable of switching so fast as to open the window on the pulse for double pumping? It's possible.

a. Not designed to double pump MS just got lucky.

So if things went the way one would think, MS got a final sample saw that "timing and tolerances" were such that double pumping could be done, then they reconfigure the controlling circuitry ( well a simplified version for my head to get around ) around this new playing field. This is not what is being said here. If there is no redesign then the controlling circuitry must be fast enough and configurable enough to be flexible enough to pull off this 88% increase.

b. Maybe MS made eSRAM smaller, more dense.

So the eSRAM with the 28nm process will be packed closer or made smaller. Now I assumed that the spacing and size of the transistors would be fixed at this lower bound. So either the transistors are redesigned in some way or reconfigured in some way as to allow a smaller and/or more efficiently packed array.

I don't assign probabilities to these things just trying to sort out wheat from the chaff.
 
Simply put the first one is highly unlikely. No mater how tight the timings are if the interface was designed to read only the rising or falling edge it's all it can do. I guess you could in theory tell it to blindly do something at the space you assume is the falling edge but that's a terrible idea. Say your calculation on where the falling edge has an error of .00000001% looks like a very smalll number doesn't it? Until you realize that error at 1GHz will be added a billion times a second. You have 10% drift possible in one second and that could be very very bad. Only getting worse unless you want to waste the cycles recalculating really often.

Not only that but its + or - 10% which is really a 20% range and it can be anywhere in there. Nedless to say that is bad and recalculating will cost cycles. These things are complicated enough as is you don't want to introduce more error to the equation.
 
Last edited by a moderator:
To be fair,

I worked on devices where we had a certain bus controller and storage device that was capable of operating at a variety of clocks, voltages, and both SDR and DDR modes. This was all configurable through software by setting some hardware registers.

Was it double pumping SDR? I would like to make keep terms like 'DDR mode' out of the conversation because the technique for operation is double pumping, DDR is the spec classification for silicon designed ground up to take achieve that efficiently and reliably. It's purely semantics, I know, but I think some ppl here have conflated the two and ignored the distinction, causing them to focus their skepticism on expectations about circuitry designs instead of actual operation (which is the claim made regarding the eSRAM boost).

We ended up picking which clock and mode we shipped in based on thermals, power consumption, and stability.

Interesting. Thanks.
 
AFAIK "double pumping" isn't exactly a very technical term and can refer to different things. Do you mean a doubled clock? Or do you mean DDR signaling, i.e. transfers at both the rising and the falling edge of the clock signal?
 
If the DF info is true, then it could be the result of separate read and write buses to the ESRAM. The ESRAM could internally organized as 8 separate banks.

You could then perform a read and write every cycle as long as there is no bank conflict. The first access is always allowed, the second has a 7/8th chance (87.5%) of not conflicting with the other access

Cheers
 
If the DF info is true, then it could be the result of separate read and write buses to the ESRAM. The ESRAM could internally organized as 8 separate banks.

You could then perform a read and write every cycle as long as there is no bank conflict. The first access is always allowed, the second has a 7/8th chance (87.5%) of not conflicting with the other access

Cheers
The eSRAM is surely banked. And GPUs tend to do quite a bit of load balancing (here one just have to find a proper interleaving scheme between banks) so they are not hit too much by bank conflicts (hurts DRAM performance too if all requests end up in the same channel ;)). This is something MS and AMD would have surely known before getting silicon back. And usually just the peak bandwidth under the assumption of no bank conflicts is given. The same is true for cache bandwidths or in GPUs specifically the local memory bandwidth.
 
Status
Not open for further replies.
Back
Top