esram astrophysics *spin-off*

Status
Not open for further replies.
That wasn't the case for many posters here. There was serious debate here on what was being measured, the reasons for why it was described as it was, and the circumstances of the discovery.

The DF article was dismissed roundly here. I agree not many here said DF made it up, but there was discussion on various other places that asserted as much, including comments made by ppl who do happen to post here iirc. If nothing else hopefully HotChips compels those of you who assumed the DF article was totally wrong somehow to take the article seriously and give it another look.

Can state what that prediction was again, and what parts of the disclosure at Hot Chips are you counting as verification?

The hypothesis was that the eSRAM's bandwidth was much higher than ppl had presumed and specifically that it was 192GB/s peak. The supporting evidence for this was what DF was told by their dev sources, as well as what I was told which corroborated their info and numbers to a tee. This info leads to the prediction that the bandwidth ranges from the presumed value of 109 GB/s (up from 102 GB/s due for clock boost) up to 204 GB/s (up from 192 GB/s as per clock boost). The HotChips presentation confirmed the prediction was correct. That doesn't speak to the mechanism though.

There are elements of interpretation to any high-level diagram, and that interpretation needs to be stated and defended.

Stated to whom? Defended from whom? The audience wasn't there to learn that double pumping existed. It's only a 30mins talk and covered the entire platform. Nor were they there to bombard MS with questions within the talk's format. Nor has any outlet bothered to ask MS for clarification or even an interview with someone to inquire about the figure.

I am also interested in knowing which parts of your claims in this thread you are saying were proven, as you have made claims to the mechanism that have implications.

Don't confuse claims with speculation. The only claim I have amde as to the specific mechanism is that it pertains to timings. The speculation put forward was about quantum effects that might lead to that, as DF's source told them that MS was suggesting it was something that came about via manufacturing. Hence, the speculation in that context was focused on 'stuff engineers wouldn't expect', which most certainly includes quantum effects usually. Again though, that's open discussion and speculation. Not a claim of fact.

What I was saying was that DF's article, which was utterly dismissed by the majority here and elsewhere, was accurate based on what I had been told ~1 month prior to their article.

Those implications lead to some kind of performance model that can be compared to the admittedly patchy information about the sustained performance of the implementation.
It's not just matching the arithmetic of 7/8, there were other parts of the DF article and others about the performance profile that need to be reconciled.

I agree. But ya gotta start somewhere and the 7/8 cycle concept works with what I was told about timings being important, DF's source's account from MS's info on the subject, and seems natural in light of the notion it was related to a surprise during manufacturing. This is how new understandings of stuff previously taken for granted happens in science. Starting with disparate bits of info and speculating as to how they might be connected in a coherent manner.
 
Well in your reply to me above you assert MS are simply double-pumping the bus.

That's what DF was told by their dev sources and it fits perfectly in line with what I was told about 'timings'. I don't think read + write on some of the cycles is something particularly controversial all of the sudden. Certainly not to the point of derailing their talk on the entire platform just to alert the audience that such a phenomena existed (which is old news obviously).

That's the mechanic that hasn't be proven via conversation.

I disagree (ignoring semantics about what is 'provable'). I think that much is the only thing that fits with the reliable info. The mechanism I was referring to is what allows for double pumping.

You've gotten as far as saying, "it could be double pumping that doesn't work some of the time and at most works 7/8 cycles," which is somewhat confusing as surely MS knew they were designing this double-pumped bus and so the results should be consistent and predictable?

Why assume this? It could have been something that they didn't realize related to the scaling of the silicon during manufacturing. It could have been that they considered it possible a priori, but felt it was unlikely that the manufacturing process would go well enough to actually see that hypothesized benefit and THAT was what surprised them. In other words, maybe they were surprised about yields being high enough to make it work and not surprised that such a thing was possible in the first place.

Or are you suggesting that MS designed the double-pumped bus but couldn't guarantee performance, so listed the lowest possible figure in their specs, and then when they got final silicon and found what proportion of the clock cycles could be reliable used, updated their specs? :???:

I've openly put forward this option as well. It'd be fantastic if someone in the press would just ask MS about the figure (interference should ask Leadbetter about this perhaps). Maybe Panello could dig around and comment on GAF about it since it's confirmed, announced spec at this point.
 
The issue is that the theoretical peak is lower than the port width and clock would suggest.
An SRAM with a read and write port should in the prefectly staggered read and write scenario reach 218 GB/s.

Also, what about the eSRAM makes it require such close coordination? There should be many shaders running concurrently, so what about the eSRAM interface prevents scheduling around conflicts?

S true multiport design would be expensive in many ways.

Assuming that the design used is banked multiport, then it makes sense that the true peak is less than interface x clock, since there's hashing and arbitration involved. Or perhaps this is a measured value by creating a perfectly staggered write/read program execution.

I would imaging that you can run multipass using the SRAM as the frame buffer and stagger the pass.
 
S true multiport design would be expensive in many ways.

Assuming that the design used is banked multiport, then it makes sense that the true peak is less than interface x clock, since there's hashing and arbitration involved. Or perhaps this is a measured value by creating a perfectly staggered write/read program execution.

I would imaging that you can run multipass using the SRAM as the frame buffer and stagger the pass.

A google search of the term "banked multiport" yielded this .pdf which contains some examples of different approaches for allowing multiple access to a single data pool. Pretty much all over my head, but hopefully it's enlightening for some. Do any of the other examples look applicable to the XBOne's ESRAM setup?
 
S true multiport design would be expensive in many ways.

Assuming that the design used is banked multiport, then it makes sense that the true peak is less than interface x clock, since there's hashing and arbitration involved. Or perhaps this is a measured value by creating a perfectly staggered write/read program execution.

I would imaging that you can run multipass using the SRAM as the frame buffer and stagger the pass.

A multiport/dual port design would certainly explain the high transistor count. It would surprise me if the sram took up around half of that budget if not more
 
Being 8T or whatever it takes eSRAM to read/write simultaneously would certainly be something they knew about years ago and not discover recently. Also, why the nonsense about "holes" from the DF article?

However, with near-final production silicon, Microsoft techs have found that the hardware is capable of reading and writing simultaneously. Apparently, there are spare processing cycle "holes" that can be utilised for additional operations.
 
The hypothesis was that the eSRAM's bandwidth was much higher than ppl had presumed and specifically that it was 192GB/s peak.
That restated what the text said.
This thread was not spun off on a tangent titled "DF says 192GB/s peak", however.

This info leads to the prediction that the bandwidth ranges from the presumed value of 109 GB/s (up from 102 GB/s due for clock boost) up to 204 GB/s (up from 192 GB/s as per clock boost). The HotChips presentation confirmed the prediction was correct. That doesn't speak to the mechanism though.
This is also an observation of what the text stated in the DF article, but if the 102, 192, and 204 numbers are all that your claims were composed of, then sure.

Stated to whom? Defended from whom? The audience wasn't there to learn that double pumping existed.
Here's where something is actually introduced that attempts to explain something based on what was observed.
Which parts of the MS disclosure corroborate this hypothesis?
Claiming double-pumping actually has implications that can, in concert with other pieces of data, create hypothetical performance profiles that we can then compare with the observed information.

Don't confuse claims with speculation. The only claim I have amde as to the specific mechanism is that it pertains to timings. The speculation put forward was about quantum effects that might lead to that, as DF's source told them that MS was suggesting it was something that came about via manufacturing.
Can you define what you mean by manufacturing?
Bug fixes, process tweaks, new steppings? I believe you were focused on physical improvements as a result of properties shifting at the level of the atomic layers of the transtor gate stack.
Which sentences in the DF article said manufacturing of the sort you claimed did it?

Hence, the speculation in that context was focused on 'stuff engineers wouldn't expect', which most certainly includes quantum effects usually. Again though, that's open discussion and speculation. Not a claim of fact.
If the argument is that anything beyond the numbers 109 and 204 has not been substantiated that well, that does rule out most of the meat in this thread.



I agree. But ya gotta start somewhere and the 7/8 cycle concept works with what I was told about timings being important,
Quantum mechanics aside, stating 7/8 and double pumping, in concert with other data presented allows for the derivation of hypothetical performance profiles.
Those can then be compared to other implementations, how they are described, and to the sustained bandwidth behavior of the eSRAM that was also included in the DF article and elsewhere.

If there is merit to that concept, shouldn't the system's sustained behavior and the hypothesized behavior have some similarity?

S true multiport design would be expensive in many ways.

Assuming that the design used is banked multiport, then it makes sense that the true peak is less than interface x clock, since there's hashing and arbitration involved. Or perhaps this is a measured value by creating a perfectly staggered write/read program execution.
Arbitration can be pipelined and bank conflicts can be ignored for the purposes of disclosing peak performance.
Events that lead to less than idealized performance are figured into the sustained performance figure.
 
There seems like a fair amount of typos in that piece. :???:

Why are you so emotionally invested in this?

For someone who professes to have an academic background surely you should understand and appreciate the need for sourcing information. If Bristol Meyer's new nutritional is presented as a new innovative approach toward diabetes in an article written by a respected journal without any sourcing or explanation of the mechanism supporting the claim and at the next big endocrinology conference they don't even mention the product people will have questions. It doesn't mean they are lying but it doesn't mean they are immune from questions either.

B3D should be less about circling the wagons to protect the brand I favor and more about communicating about the tech, how it works and the pros and cons of various applications. There is so little information available that very little meaningful discussion can even occur yet some appear to be offended for even pointing that out.
 
So does this boil down to MS is using Nvidia's patent? Does MS have to disclose IP they use, can we look it up?

http://www.google.com/patents/US7643330

By double-pumping the SRAM storage cells, one read access and one write access are possible per clock cycle, allowing the SRAM to present two external ports, each capable of performing one transaction per clock cycle.

Which of course doesn't explain the 7/8 hand waving.
 
So does this boil down to MS is using Nvidia's patent? Does MS have to disclose IP they use, can we look it up?
MS using an Nvidia GPU patent in an AMD APU?
Kinky.

They aren't required the list all the patents they use. I think the trick is to implement whatever you want without looking.

Which of course doesn't explain the 7/8 hand waving.
It doesn't explain the 7/8, and it also doesn't explain the "surprise" factor. The patent covers something that would be very explicitly designed in.

I'm curious if there would be implications for SRAMs as large as Durango's and on a leading-edge or nearly so process. Not sure if that sort of knowledge would be something that could be disclosed.
 
MS using an Nvidia GPU patent in an AMD APU?
Kinky.

They aren't required the list all the patents they use. I think the trick is to implement whatever you want without looking.


It doesn't explain the 7/8, and it also doesn't explain the "surprise" factor. The patent covers something that would be very explicitly designed in.

I'm curious if there would be implications for SRAMs as large as Durango's and on a leading-edge or nearly so process. Not sure if that sort of knowledge would be something that could be disclosed.

to be clear ... MS and NVIDIA have had a strong relationship for well over a decade , and MS uses NVIDIA technology, not necessarily its GPU, in Xbox .. I can imagine there are patents in Xbox One from NVIDIA that MS has license rights to.

http://allthingsd.com/20110604/everybody-chill-the-nvidia-microsoft-pact-is-actually-11-years-old/
 
to be clear ... MS and NVIDIA have had a strong relationship for well over a decade , and MS uses NVIDIA technology, not necessarily its GPU, in Xbox .. I can imagine there are patents in Xbox One from NVIDIA that MS has license rights to.

http://allthingsd.com/20110604/everybody-chill-the-nvidia-microsoft-pact-is-actually-11-years-old/

I'm not ruling out really complex contract gymnastics, but the evidence suggests Microsoft isn't the one manufacturing and selling the APU.

The scope of any patent agreement with AMD may be more relevant, and whether a patent of this type that covers a general physical design would be something that would fall under a more generic covenant not to sue over graphics-related functions.
 
Arbitration can be pipelined and bank conflicts can be ignored for the purposes of disclosing peak performance.
Events that lead to less than idealized performance are figured into the sustained performance figure.

I have no idea, circutry is really not my field, I'm a software person.

My EE phd friend had some ideas around the "missing cache" of 11MB, out of 47MB, he only counted 32MB + 4MB and how it might be used as a stream buffer, but we haven't talked in depth about it.

Quite an irony that if MSFT had said that the peak is 218GBps then there could be less questions ;)
 
I have no idea, circutry is really not my field, I'm a software person.

My EE phd friend had some ideas around the "missing cache" of 11MB, out of 47MB, he only counted 32MB + 4MB and how it might be used as a stream buffer, but we haven't talked in depth about it.
It was 47MB of "storage". That can include a number of other things besides the eSRAM and CPU L2 caches.
There are register files (3MB in GPU vector registers alone), various buffers, and any other storage pools that could figure in as well.

Quite an irony that if MSFT had said that the peak is 218GBps then there could be less questions ;)

Probably.
If they had just stated that, it would have been consistent with how other memory pools are described and could have been chalked up to imprecise language.
People would look at the leaked diagrams and the ambiguous description of the read/write capability and just think "oh, that number was in each direction", at least until people developing for it started asking why its average peformance on real code was a little over half what was claimed.

It's the imprecision of the language and the insistence of special cases where there normally wouldn't be that make me ask why the all the extra song and dance.
 
This may be nonsense as this isn't my field BUT I found this comment interesting over at the extremetech xb1 article -

"The math on ESRM isnt odd. You can write or read at 109GB/s, but when you read and write at the same time you need a byte (or something), when you read or when you write for control (or for something).
(109x2) - (109/8) = 204 GB/s"

Totally confused by his logic BUT thought I'd share incase anyone here has input!

link to comment - http://www.extremetech.com/gaming/1...d-odd-soc-architecture-confirmed-by-Microsoft

(I apologise upfront if this calculation has already been discussed here)
 
In that scenario, it would be about 16 bytes used per read+write cycle.
In order for it to impinge on realized bandwidth, command data would have to spill onto the data lines.
There are packetized links that have messaging overhead built into what they send, which might be somewhat along those lines.
More information would be needed on the nature of the interface and how it receives commands.

For the caches like the GPU's L2, command signals don't normally filter into the data lines, since they have their own pathways.
It also wouldn't go any further in explaining the rest of the gap mentioned between benchmarking code and the reduced peak, meaning there would be more to the story.

If commands are sent separately and in their own sideband, a queue that collects commands in the interface could figure it out on the fly without additional bytes being lost. This is what existing control logic already does elsewhere, so the departure would be an interesting talking point.
 
This may be nonsense as this isn't my field BUT I found this comment interesting over at the extremetech xb1 article -

"The math on ESRM isnt odd. You can write or read at 109GB/s, but when you read and write at the same time you need a byte (or something), when you read or when you write for control (or for something).
(109x2) - (109/8) = 204 GB/s"

Totally confused by his logic BUT thought I'd share incase anyone here has input!

link to comment - http://www.extremetech.com/gaming/1...d-odd-soc-architecture-confirmed-by-Microsoft

(I apologise upfront if this calculation has already been discussed here)

Maye I'm wrong but:
(109GB/s¹ + 109GB/s²) - (109/8³) = 204GB/s

¹. 109GB/s read
². 109GB/s write
³. 1 byte (8 bits)

:oops:
 
That restated what the text said.
This thread was not spun off on a tangent titled "DF says 192GB/s peak", however.

This thread is about speculation as to how they got where they did. Obviously we agree on that as my post was the offshoot comment leading to the thread being created/separated. Note also that my original reply as to the HotChips info was actually in the general X1 hardware investigation thread, not this one. Shifty moved it here. It wasn't ever saying the mechanism was confirmed, just the result. I'd also add that double pumping in some fashion clearly seems to be confirmed as well based on the specs we do have. Surely you can agree with that too.

This is also an observation of what the text stated in the DF article, but if the 102, 192, and 204 numbers are all that your claims were composed of, then sure.

My claim was that timings somehow allowed for MS to allow read + wright on same cycle and dramatically up bandwidth in the eSRAM. That is what I was told, along with bandwidth numbers. I chose not to be vocal about my info until the DF article surfaced (Rangers and others can attest to the fact I knew well before DF did). Let's not confuse claims (which are now confirmed) with speculation here (which is still almost entirely unknown for certain).


Here's where something is actually introduced that attempts to explain something based on what was observed.
Which parts of the MS disclosure corroborate this hypothesis?
Claiming double-pumping actually has implications that can, in concert with other pieces of data, create hypothetical performance profiles that we can then compare with the observed information.

I was specifically told it was read + write on same cycle. If it was dual ported that would have been known from day 1 and they would have only listed peak and not the 'min' alongside it, so the only other option I'm aware of is double pumping. DF was told the same as per their article. So we have both very precise figures along with read + write on same cycle both with my source and DF's dev source info, supposedly from MS. MS confirmed the claim of DF and the figures I got. Charlie also has corroborated what I had been told about real world figures (I was told 142 GB/s, he heard ~140 GB/s).

Can you define what you mean by manufacturing?
Bug fixes, process tweaks, new steppings? I believe you were focused on physical improvements as a result of properties shifting at the level of the atomic layers of the transtor gate stack.

Right, I had made the point that shrinking things down (like circuirt elements) can typically lead to adjustments in the quickness of state changes due to quantum effects becoming more and important as your shrink your elements. That's only related to the eSRAM gains by means of my speculation though. I never made the claim that it got bandwidth boosted by means of such a mechanism. Again, don't confuse open questioning and speculation with a conclusion. No conclusion/claim was being made as to the details outside of me asserting it was possible.

Which sentences in the DF article said manufacturing of the sort you claimed did it?

I never said DF mentioned anything specific about manufacturing.

If the argument is that anything beyond the numbers 109 and 204 has not been substantiated that well, that does rule out most of the meat in this thread.

I wouldn't say it rules anything about the mechanism out. But again, my post after HotChips was in the general X1 hardware thread...Shifty moved it here even after I expressed my view it should stay there and my commentary therein wasn't saying the speculation about the mechanism was confirmed, just the figure itself and that DF's article was correct in that area.

Quantum mechanics aside, stating 7/8 and double pumping, in concert with other data presented allows for the derivation of hypothetical performance profiles.
Those can then be compared to other implementations, how they are described, and to the sustained bandwidth behavior of the eSRAM that was also included in the DF article and elsewhere.

If there is merit to that concept, shouldn't the system's sustained behavior and the hypothesized behavior have some similarity?

It does, once you account for the upclock.

(133 GB/s)*(853/800) = 142 GB/s

...which is entirely in line what Charlie was told from his article today. The peak value also scales identically.

(192 GB/s)*(853/800) = 204 GB/s

I also still speculate that the clock boost was limited by the timing mechanism here, whatever form it takes, since that math works out a bit too perfectly to brush off as mere coincidence imho. I view that clock boost figure as additional evidence to support what I was told about timings since if you boost too much you close that window for read + write to take place and going too high with the clock would erase all your double pumping.

Arbitration can be pipelined and bank conflicts can be ignored for the purposes of disclosing peak performance.
Events that lead to less than idealized performance are figured into the sustained performance figure.

As you have said before, clearly the 204 GB/s figure is achievable somehow, lest no way MS would have claimed it as their peak at a symposium like HotChips. So we have both figures, which is rare and probably rather unique. Ok, so why didn't they feel comfortable only giving us the peak value? I submit that it was likely because it is wholly misleading. If that is true, then why would it be more misleading than anyone else's figures that are usually expressed as peak bandwidth? I submit that it's because this is only accomplished via exceedingly unrealistic conditions for game code. That leads me to stand by my 7/8 cycle double pump theory as that would seem to me to likewise require unrealistic game code to max out.

Either way, if it was something they knew full well about all along the VGLeaks info wouldn't be what it was and they'd likely feel comfy claiming only the peak value instead of seemingly feeling obliged to mention both. So why list both? If the peak is totally unrealistic and misleading, then it makes sense and would seem to me to corroborate the idea that something non-ideal boosted the bandwidth (non-ideal in the sense that it wasn't a full, simple doubling of the bandwidth).
 
Status
Not open for further replies.
Back
Top