esram astrophysics *spin-off*

Status
Not open for further replies.
We all know the nature of the mystery. Whether MS are using creative PR or not is a matter of faith at this point. Can we please keep this thread to possible technical solutions to the mystery.

Because it's potential a complete revolution within the EE sector!! Seriously, they've come up with a BW enhancing feature that no-one can fathom, and they didn't talk about it at all? "Hi guys, welcome to HotChips. Today we'll be talking about games console APU, which is much like any other AMD APU. One special feature we have is that we've found a way to get partial double transfers across a bus boosting bandwidth by 30% or more. I see you're all pretty excited at that prospect! But we won't talk about that. Instead we're going to discuss some conventional specialist processing blocks in there." :p They didn't present any interesting tech at HotChips, neither their ToF sensor nor how they have achieved something no-one else has achieved for boosting IO BW. I think a lot of us presumed MS would show something more interesting. I don't honestly know why they bothered showing what they did because it doesn't reflect anything of interest within the EE sector. They kept out all the juicy bits.

This is speculation, but if the numbers are true, and MS did develop/discover a (really) special feature, is it not possible that they don't want to share it because they are afraid that other companies will copy it?

Nobody can fathom how it works so, I believe that they really hit a goldmine through their technical engineering, if the numbers add up that is.

(As for the ToF sensor: they would have to mention latencies, which will just be used against them so I understand why they want to keep the specs out of it this time)
 
So just kinda going off by numbers for the eSRAM:
853Mhz x 256 x 4 / 8 = 109 GBps BW minimum
Typically SRAM would have idle cycles for R/W switching, so this "minimum could imply that it's ZBT SRAM, meaning no cycle wasted, hence guaranteed.

The simutaneous R/W, suggest two port (1R and 1W) for the SRAM as well.
Look at the 4 banks, instead of 1 common, this seem like a deliberate design for banked multiport memory, hence you can almost double the BW, but at the same time there's the arbitration delay, at the same time the actual ports available is probably less than the external ports.
 
This is speculation, but if the numbers are true, and MS did develop/discover a (really) special feature, is it not possible that they don't want to share it because they are afraid that other companies will copy it?
If so, they'd patent it and license the tech. As they aren't chip designers, there's no reason for them to horde the technology for their own uses.

(As for the ToF sensor: they would have to mention latencies, which will just be used against them so I understand why they want to keep the specs out of it this time)
Please don't go off topic. Putting an OT in parenthesis doesn't mitigate it's impact on sidetracking discussion - ask your questions in the relevant thread.
 
Has it been confirmed if the esram is single or dual ported?
Or was it just an assumption based on the vgleaks docs?
Maybe those docs only gave one figure at that time, for a multitude of reasons.

Just wondering if it always may have had a range, and isn't necessarily something new.
I.e. what could be new is the change in range, not the fact it has a range.
 
Has it been confirmed if the esram is single or dual ported?
Or was it just an assumption based on the vgleaks docs?
Maybe those docs only gave one figure at that time, for a multitude of reasons.

Just wondering if it always may have had a range, and isn't necessarily something new.
I.e. what could be new is the change in range, not the fact it has a range.

The DF article said they discovered this, then they hand waved about holes.
 
A deep dive in to Microsoft’s XBox One GPU and on-die memory

http://semiaccurate.com/2013/08/30/a-deep-dive-in-to-microsofts-xbox-one-gpu-and-on-die-memory/

I had always assumed this but glad Charlie seems to have verified this in his latest article..

on SoC 32mb esram is not a traditional cache and it can hold a D3Ð surface. And a dev has access to this.. after build2013, a developer conference for ms devs, I suspected Ms would do this in xb1

And a bad dev would get the le 109gb bw and optimized code could get theoretical bw 204gb
 
A deep dive in to Microsoft’s XBox One GPU and on-die memory

http://semiaccurate.com/2013/08/30/a-deep-dive-in-to-microsofts-xbox-one-gpu-and-on-die-memory/

I had always assumed this but glad Charlie seems to have verified this in his latest article..

on SoC 32mb esram is not a traditional cache and it can hold a D3Ã￾ surface. And a dev has access to this.. after build2013, a developer conference for ms devs, I suspected Ms would do this in xb1 í ½í¸‰

And a bad dev would get the le 109gb bw and optimized code could get theoretical bw 204gb
The part about bandwidth is just as hand-wavey as everyone else at this point. I don't think Charlie has any info we don't have.

Also, I only count 5 processors in the Audio block. Unless you're counting each SHAPE block as a processor, then I suppose it could be 8.
 
Because it's potential a complete revolution within the EE sector!! Seriously, they've come up with a BW enhancing feature that no-one can fathom, and they didn't talk about it at all?
The thing that piqued my interest was that there are a number of ways of doing this that don't require a revolution. What seemed unusual was that they've avoided describing the interface in a straightforward manner that others have been.

It's fine to ignore bank conflicts and corner cases that cause bandwidth to fall below peak and just say the bandwidth is X, or X in both directions.
It's normally only if it's the other way around that you shy away from using the peak bandwidth, and the wiggle room for how frequently you fall below peak before you don't use that in the chart is very generous.

The idea of a "surprise" 7/8 scenario is also kind of strange to me because I don't think the engineers can be surprised like that, and if we accept that the designers know what they are building why not 8/8? There might be something interesting in that.

On the other hand, we do know that other memory interfaces have performance optimizations that can give special-case boosts, and we know marketing can get cute with numbers.
 
The thing that piqued my interest was that there are a number of ways of doing this that don't require a revolution. What seemed unusual was that they've avoided describing the interface in a straightforward manner that others have been.
That's very valid. A lot of the discussion has been coloured by DF's editorial. If we look at the pure facts and appreciate the details of the interface are missing, I guess there may be more obvious explanations, not that I know what any of them are.
 
The part about bandwidth is just as hand-wavey as everyone else at this point. I don't think Charlie has any info we don't have.

It's interesting that the eSRAM is described as being so straightforward and low overhead that it doesn't need a lot of tricks to be utilized well, then it is later explained that code can readily chop its performance in half and you have to sacrifice a goat to get peak.
He also apparently didn't note that prior to the upclock that the given peak numbers for the interface were 196 GB/s.

He also is apparently using a 1.6 GHz clock in his math, but if that's true, why would read-only or write-only traffic halve the bandwidth?
Why would any architectural peak or minimum start worrying about whether every value in an cache line is utilized? If that's Charlie's logic, it's trivial to just write one pixel value to an eSRAM line and gut performance to way below half.

How is the eSRAM being addressed or exported to where it's so hard to get full bandwidth?
Have two separate shaders, one that writes nice chunks of data to eSRAM and another that reads nice chunks back.
 
I just read through part 2 on the SA article, frankly I now think he doesn't know what he's talking about judging from all those misquoted speed, terms, and so on.

Since the chart says 4 x 256b R&W, I think the 2-port SRAM make sense. In reality no real world code will come close to the 204GBps, you'd need to stagger the writes and reads so perfectly that you are just basically using it as a buffer and stream the data at that point.

I don't really buy the cycle "holes" but if MSFT had discovered something and they are in the process of filing a patent for it, it makes sense they don't talk about it in details.
 
Since the chart says 4 x 256b R&W, I think the 2-port SRAM make sense. In reality no real world code will come close to the 204GBps, you'd need to stagger the writes and reads so perfectly that you are just basically using it as a buffer and stream the data at that point.
The issue is that the theoretical peak is lower than the port width and clock would suggest.
An SRAM with a read and write port should in the prefectly staggered read and write scenario reach 218 GB/s.

Also, what about the eSRAM makes it require such close coordination? There should be many shaders running concurrently, so what about the eSRAM interface prevents scheduling around conflicts?
 
Last edited by a moderator:
From SA article intersting

To translate from technical minutia to English, good code = 204GBps, bad code = 109GBps, and reality is somewhere in between. Even if you try there is almost no way to hit the bare minimum or peak numbers. Microsoft sources SemiAccurate talked to say real world code, the early stuff that is out there anyway, is in the 140-150GBps range, about what you would expect. Add some reasonable real-world DDR3 utilization numbers and the total system bandwidth numbers Microsoft threw around at the launch seems quite reasonable. This embedded DRAM is however not a cache in the traditional PC sense, not even close.S|A
 
If you take the bandwidth numbers from the DF article about the engineers receiving a surprise performance upgrade and increase them by 7%, you get the same range as that article.
 
No one ever said DF made it up, they said DF was regurgitating what MS told them. So we are back to a single source, one does not back the other up. We still have no idea where the numbers came from or how real they are.

LOADS of ppl thought DF made it up all over the internet. And unless my source and DF's source is magically the same (doubtful) then it's not just 1 source. And that's moot anyhow, as it's now official as per HotChips. The question has shifted from being 'IS the DF article correct' to 'HOW did that happen'. MS doesn't need backup when it comes to confirming their own design specs. They ARE the confirmation. HotChips proved that what I was told and what DF was told were accurate info.

Saying one number is 7/8 of the other is not an explanation, it is math. I want to know where the 204GB/s number comes from and what it takes to break the "min" number which used to also be the max.

I never said it was the explanation, I said it was a hypothesis that made a specific and unique prediction. That prediction was true.
 
I never said it was the explanation, I said it was a hypothesis that made a specific and unique prediction. That prediction was true.

Ignoring the sourcing (I always read MS was the source, didn't DF say that? That part isn't interesting anyhow). The question remains "how" what is your prediction? This?

The math works out such that if you are capable of reading/writing during the same cycle for 7/8 cycles you get the quoted 192GB/s figure.

That is the math, which works, but it doesn't explain how MS discovered this and process behind the numbers.
 
Because it's potential a complete revolution within the EE sector!! Seriously, they've come up with a BW enhancing feature that no-one can fathom, and they didn't talk about it at all? "Hi guys, welcome to HotChips. Today we'll be talking about games console APU, which is much like any other AMD APU. One special feature we have is that we've found a way to get partial double transfers across a bus boosting bandwidth by 30% or more. I see you're all pretty excited at that prospect! But we won't talk about that. Instead we're going to discuss some conventional specialist processing blocks in there." :p They didn't present any interesting tech at HotChips, neither their ToF sensor nor how they have achieved something no-one else has achieved for boosting IO BW. I think a lot of us presumed MS would show something more interesting. I don't honestly know why they bothered showing what they did because it doesn't reflect anything of interest within the EE sector. They kept out all the juicy bits.

You should really read my posts before replying. Again, they had 30mins to cover everything, including the Kinect hardware. You guys are pretending they hid something from you. And based on the tweets of attendees, it is pretty clear the audience didn't feel similarly to you as to what was showed. The topic of the talk wasn't X1's eSRAM or double pumping it (the latter or which would somehow be a "revolution in EE"? wat?!). It was about the broader architecture.

It's not like anyone actually asked MS about this detail so acting as if they are hiding anything is really dumb. If the spec was being passed around as pure PR it'd be different. This is completely removed from that. It's a symposium. If they wanted to generate PR buzz they'd have hyped the symposium up, they'd have leaned on the new info with specific PR evangelism, and they'd have used a much more creative number that had no reference whatsoever to a minimum threshold.
 
LOADS of ppl thought DF made it up all over the internet. And unless my source and DF's source is magically the same (doubtful) then it's not just 1 source.
That wasn't the case for many posters here. There was serious debate here on what was being measured, the reasons for why it was described as it was, and the circumstances of the discovery.

I never said it was the explanation, I said it was a hypothesis that made a specific and unique prediction. That prediction was true.

Can you state what that prediction was again, and what parts of the disclosure at Hot Chips are you counting as verification?

There are elements of interpretation to any high-level diagram, and that interpretation needs to be stated and defended.

I am also interested in knowing which parts of your claims in this thread you are saying were proven, as you have made claims to the mechanism that have implications. Those implications lead to some kind of performance model that can be compared to the admittedly patchy information about the sustained performance of the implementation.
It's not just matching the arithmetic of 7/8, there were other parts of the DF article and others about the performance profile that need to be reconciled.
 
Ignoring the sourcing (I always read MS was the source, didn't DF say that? That part isn't interesting anyhow). The question remains "how" what is your prediction? This?

Devs told DF what MS was telling them. It was 2nd hand info already by the time it got to DF.

That is the math, which works, but it doesn't explain how MS discovered this and process behind the numbers.

I'm aware. That's the topic of the thread...speculation as to how it is getting that figure. As I've said in this thread and others, it's possible that during manufacturing somehow shrinking things maybe things behave differently than they expected at that scale. I dunno. But I am all but certain it is directly tied to the timing to change states during the cycles. My source told me that months ago right after the reveal.

As of now, that math is the only thing that fits. Don't dismiss it so readily. ;)
 
That's the topic of the thread...speculation as to how it is getting that figure...
Well in your reply to me above you assert MS are simply double-pumping the bus. That's the mechanic that hasn't be proven via conversation. You've gotten as far as saying, "it could be double pumping that doesn't work some of the time and at most works 7/8 cycles," which is somewhat confusing as surely MS knew they were designing this double-pumped bus and so the results should be consistent and predictable? Or are you suggesting that MS designed the double-pumped bus but couldn't guarantee performance, so listed the lowest possible figure in their specs, and then when they got final silicon and found what proportion of the clock cycles could be reliable used, updated their specs? :???:
 
Status
Not open for further replies.
Back
Top