esram astrophysics *spin-off*

Status
Not open for further replies.
This is more of a philosophical question than a scientific one. If a memory request is made in the woods where there are no clients, does it contribute to bandwidth?

More seriously, I am curious how absolute that minimum is. Assuming enough requests to allow one access per cycle, is there no access pattern that leads to less than peak?
If there's some worst case code that needs to read one bit at a time, scattered randomly, that would be pretty much the absolute minimum? One bit multiplied by read-ops/s.

That's what throws me off when they say something is maxing out the bandwidth. If something need to read one bit at a time, and the memory system reads a 1024bit row every cycle, the "useful" bandwidth is 0.1%, but the "used" bandwidth is 100%.
 
How can a minimum be 100% BW efficiency and yet "real world" code be around 70% efficiency? Is real world less than minimum? Philosophical or not, it's a vexing question.

This can happen if you aren't measuring the same thing throughout the sentence.

The minimum is a value promised as the capability of the eSRAM with a pure read or write stream, irrespective of what an actual application would request.
Possibly, the internal latency of the arrays is such that this can be managed, I'm not sure. That goes to how ironclad their minimum value is.

Turnaround or recovery latency when mixing traffic might be why it becomes much more sensitive when mixing traffic. This is the not-quite-doubled peak.
The "real world" results are between the minimum and peak, and most likely represent portions of high demand in an application.
This starts factoring in things like whether the bandwidth needs match what the eSRAM can support--a nearly 50:50 mix.
Then there is the question of every other thing the code needs besides the eSRAM's bandwidth, or the overheads associated with using it like initial copies or zero-fill.

To reiterate, what is the measurement in each part of the question? It's not wrong to have mismatched numbers if the question being asked isn't the same.

edit:

If there's some worst case code that needs to read one bit at a time, scattered randomly, that would be pretty much the absolute minimum? One bit multiplied by read-ops/s.
That's not the sort of bandwidth the eSRAM cares about. The storage doesn't know the context of what it is giving, it just provides data.
It's certainly a valid question as far as the effectiveness of the algorithm, but that's a higher level question than the controllers and arrays can answer.
 
This can happen if you aren't measuring the same thing throughout the sentence.

The minimum is a value promised as the capability of the eSRAM with a pure read or write stream, irrespective of what an actual application would request.
Possibly, the internal latency of the arrays is such that this can be managed, I'm not sure. That goes to how ironclad their minimum value is.

Turnaround or recovery latency when mixing traffic might be why it becomes much more sensitive when mixing traffic. This is the not-quite-doubled peak.
The "real world" results are between the minimum and peak, and most likely represent portions of high demand in an application.
This starts factoring in things like whether the bandwidth needs match what the eSRAM can support--a nearly 50:50 mix.
Then there is the question of every other thing the code needs besides the eSRAM's bandwidth, or the overheads associated with using it like initial copies or zero-fill.

To reiterate, what is the measurement in each part of the question? It's not wrong to have mismatched numbers if the question being asked isn't the same.

edit:


That's not the sort of bandwidth the eSRAM cares about. The storage doesn't know the context of what it is giving, it just provides data.
It's certainly a valid question as far as the effectiveness of the algorithm, but that's a higher level question than the controllers and arrays can answer.

Ok, you've got a 109GB minimum write BW, but then you are only writing 7 out of every 8 cycles. So is your minimum 109GB or somewhere near 94.4GB or something else?
 
Ok, you've got a 109GB minimum write BW, but then you are only writing 7 out of every 8 cycles. So is your minimum 109GB or somewhere near 94.4GB or something else?

The 7/8 cycle issue sounds like it doesn't come up until reads and writes are occurring at the same time. There may be a turnaround penalty or some kind of contention in the logic where reads and writes are running concurrently. There may be more spare capacity in a pure read or pure write case, where the issue logic is the limiter and the other concerns can be pipelined away.
 
So minimum is minimum until some time passes and then it's less than minimum?

When you perform a read operation, it's going to read a fixed number of bits. With ESRAM I don't know if that number of bits matches the width of the bus, or not. If you just wanted a single byte of information, I don't know if the read pulls the whole 1024 bits (ESRAM bus width?) from the source address or not. Someone else can probably answer that. Basically, the bandwidth is going to be the minimum number of bits a read operation uses. So if you start looking at bandwidth over a period of a second, it really changes depending on what you're doing. Why a second is considered a useful period of measurement, other than it being a period our brains can comprehend, I do not know.

Understand where I'm going with this?
 
When you perform a read operation, it's going to read a fixed number of bits. With ESRAM I don't know if that number of bits matches the width of the bus, or not. If you just wanted a single byte of information, I don't know if the read pulls the whole 1024 bits (ESRAM bus width?) from the source address or not.
Going by the DF interview, the eSRAM controllers sit on the same side of the memory access crossbar as the DRAM controllers. They receive the same sorts of requests, which should generally be cache line granularity, although I am now curious which cache line that would be for some clients.

As an example, GDDR5 has a 32-bit channel and burst length of 8, which means data is put on a 256-bit buffer and then put on the external bus.
The eSRAM's lanes are 256 bits wide, which means the same transaction doesn't need to be spit out over a higher-clocked double-pumped bus.

At the time GDDR5 was introduced, GPU cache lines were 256 bits. The GCN CU L1s have doubled that to 64 byte lines, which actually makes things a bit simpler for getting stretches of the same kind of access when all requests are doubly long.
I'm curious what the line length is for the ROP color and depth caches. They don't need to be coherent with an x86 cache, so it may not be strictly necessary that they extend their lines.

There are non-cached accesses which might be able to break some of the assumptions, but generally we're talking about servicing cache line writeback or fills.
Whether every bit is needed is another question entirely, and the memory controller has no good way of knowing.
 
At the time GDDR5 was introduced, GPU cache lines were 256 bits. The GCN CU L1s have doubled that to 64 byte lines, which actually makes things a bit simpler for getting stretches of the same kind of access when all requests are doubly long.
I think Radeon's cache line size is already a bit longer at 64byte.
I'm curious what the line length is for the ROP color and depth caches. They don't need to be coherent with an x86 cache, so it may not be strictly necessary that they extend their lines.
They don't have strict line sizes. They cache framebuffer tiles which are probably (at least?) 8x8 pixel in size so basically larger (256 byte for 4 byte per pixel, with other color formats or MSAA more, at least internally, external traffic is done with compressed tiles, which means a variable size).
 
Is that just the logical granularity, or the physical granularity of the ROP caches?

As far as the 32B lines go, I may have been thinking of the GTX 580 L2.
 
I think Radeon's cache line size is already a bit longer at 64byte.
They don't have strict line sizes. They cache framebuffer tiles which are probably (at least?) 8x8 pixel in size so basically larger (256 byte for 4 byte per pixel, with other color formats or MSAA more, at least internally, external traffic is done with compressed tiles, which means a variable size).

On older cards at least the tiles were a lot more than 64 pixels. I want to say more like 1024 pixels, but it's been a long time since I looked at a GPU at that level of detail.
 
On older cards at least the tiles were a lot more than 64 pixels. I want to say more like 1024 pixels, but it's been a long time since I looked at a GPU at that level of detail.
It can't be that large as that wouldn't fit the caches, which store the tile in uncompressed format. That means 1024 pixel with 4xMSAA and a 64 bit color format would take 32kB, but each RBE has just 16kB color cache, so the tiles must be smaller. And too large tiles also waste bandwidth outside of fillrate tests. I would say the maximum size is likely 16x16 pixels with 8x8 more likely. Maybe it also depends a bit on the color format or there are also 8x16 tiles possible.

@3dilettante:
I have no real answer for you. As the tiles have probably a format depending, variable size (measured in bytes) within the caches, I don't no if the concept of the usual cache lines apply in a straightforward way for these specialized caches.
 
Might as well put the quote.

Look, we’re not pushing polygon counts like a lot of the big devs. We’re likely not taking the hardware to it’s breaking point with Below. It took us quite a while to make our way through all of the documentation, but we’ve really enjoyed working on the Xbox One platform. We’ve had no ESRAM bottlenecks.
 
Might as well put the quote.
I can't quite bring myself to to change my view on the article news because of that quote, he just says the eSRAM isn't a bottleneck for them, but well, maybe not an universal truth. Point taken...
Not pushing the console and not hitting any limits.

In other news night will follow day.
:eek: The moon can never be as high as the midday sun.

New article on the eSRAM.

http://gamingbolt.com/why-xbox-ones...-to-store-6gb-of-tiled-textures-using-dx-11-2
 
Point taken...:eek:

I didn't really have a point, but why make people click to read something so short? eSRAM not being a bottleneck means what exactly? eSRAM is supposed to be the solution to a bottleneck. What he probably means is he can get by fine with the DDR3 and bandwidth is not a bottleneck.
 
I didn't really have a point, but why make people click to read something so short? eSRAM not being a bottleneck means what exactly? eSRAM is supposed to be the solution to a bottleneck. What he probably means is he can get by fine with the DDR3 and bandwidth is not a bottleneck.

I think he means exactly what he said, "We’ve had no ESRAM bottlenecks. " Especially in the context of latest ramblings about eSRAM being a bottleneck, that seems the most likely interpretation.
 
I think he means exactly what he said, "We’ve had no ESRAM bottlenecks. " Especially in the context of latest ramblings about eSRAM being a bottleneck, that seems the most likely interpretation.

He was responding to a question, not Internet ramblings. So it is open to interpretation.
 
I don't think eSRAM was ever considered a bottleneck anyway.

The relative small size limits its usefullness alright but that really just reduces its capability to solve bottlenecks, instead of being one. Bandwidth wise we can probably all agree that it probably has more than enough to serve 32MB.
 
Status
Not open for further replies.
Back
Top