esram astrophysics *spin-off*

Status
Not open for further replies.
That article also states the eSRAM is running at 2x the GPU clock.
That is another hypothetical way to higher bandwidth, although the article doesn't explain the min/peak asymmetry and some of the latest statements about the architecture.
 
That article also states the eSRAM is running at 2x the GPU clock.
That is another hypothetical way to higher bandwidth, although the article doesn't explain the min/peak asymmetry and some of the latest statements about the architecture.

read my post #212
 
The PR statements and GAF based dialogue between Penello and everyone else is worthless to this discussion unless he happens to explain the eSRAM BW in more detail. If he doesn't, that topic has nothing to do with this thread.
 
read my post #212

Unfortunately, I don't think that is consistent with the article's math, either.
It's written like it assumes the final post-bump bandwidth is what was available prior to the upclock, and that the upclock might have raised it further.

edit:
However, it does claim that the eSRAM is heavily banked internally.
 
Unfortunately, I don't think that is consistent with the article's math, either.
It's written like it assumes the final post-bump bandwidth is what was available prior to the upclock, and that the upclock might have raised it further.

edit:
However, it does claim that the eSRAM is heavily banked internally.

the SA article (assume this is what you are referring to) is so badly written with tons of error. I am doubtful that the writer is all that technical.

edit:
I'd happy to exam my theory if you can provide me what you found from my math that has discrepancies.
 
Last edited by a moderator:
the SA article (assume this is what you are referring to) is so badly written with tons of error. I am doubtful that the writer is all that technical.

edit:
I'd happy to exam my theory if you can provide me what you found from my math that has discrepancies.

I was addressing the Semiaccurate article. I thought you were referencing the earlier post to explain what the article wrote, but the article's math didn't match yours.
 
I was addressing the Semiaccurate article. I thought you were referencing the earlier post to explain what the article wrote, but the article's math didn't match yours.

Ah, I see. No, what I was trying to work out the math based on the VGLeaks numbers, upclock adjustment, and the hotchips numbers.
 
Seems that the 204 number is wrong...

Albert Penello said:
Freki said:
So why is the bidirectional bandwith of your eSRAM 204GB/s although your one-directional bw is 109GB/s - shouldn't it be 218GB/s?

Yes, it should be. And I was quickly corrected (both on the forum and from people at the office) for writing the wrong number.

...

I still stand by what I stated (except for the aforementioned 204/218). In fact, some really interesting threads were going back and forth giving me even more excruciating detail behind those numbers based on the questions people asked.

Someone did follow-up with a question if the HotChips slides were wrong too since they used the 204 number. He hasn't replied to that. Seems like he's getting all the answers to people's questions but it will take a week or two to get them.

http://www.neogaf.com/forum/showpost.php?p=81372357&postcount=632

Tommy McClain
 
With everyone talking about OT system bandwidth instead of how the eSRAM bus manages its flexi-speed, I'll close this thread until I have time to clean up and spawn.

XB1's Bandwidth discussion here.
 
May be this AMD patent is applicable to the eSRAM.

Abstracting scratch pad memories as distributed arrays
http://www.google.com/patents/US20130212350

FIG. 4 is a block diagram of a memory model showing how the distributed array may map to physical memory; and

Many programming models (for example, in graphics processing units (GPUs), heterogeneous computing systems, or embedded architectures) have to control access to multiple levels of a memory hierarchy. In a memory hierarchy, certain memories are close to where the operations happen (e.g., an arithmetic logic unit), while other memories are located farther away. These different memories have different properties, including latency and coherency. With latency, the farther away the memory is located from where the operation happens, the longer the latency. With coherency, when a memory is located closer to the chip, it may not be able to see some reads and writes in other parts of the chip. This has lead to complicated programming situations dealing with addresses in multiple memory spaces.

Abstracting multiple memory spaces may be done to improve portability (for example, when such disjoint memory spaces do not exist on a target architecture), improve compiler optimizations, support dependence tracking, and provide automated data persistence.

A distributed array is declared in terms of segments, a number of which will be allocated at run time to match the number of executing groups. Each executing group has access to its own segment. By using segments, the programmer may focus on accessing the data, rather than where the data is mapped to.

In a computing system, memory may be managed by using a distributed array, which is a global set of local memory regions. A segment in the distributed array is allocated and is bound to a physical memory region. The segment is used by a workgroup in a dispatched data parallel kernel, wherein a workgroup includes one or more work items. When the distributed array is declared, parameters of the distributed array may be defined. The parameters may include an indication whether the distributed array is persistent (data written to the distributed array during one parallel dispatch is accessible by work items in a subsequent dispatch) or an indication whether the distributed array is shared (nested kernels may access the distributed array). The segment may be deallocated after it has been used.

DETAILED DESCRIPTION
In a heterogeneous computing system, memory may be managed by using a distributed array, which is a global set of local memory regions. To use the distributed array, it is first declared along with optional parameters. The parameters may include an indication whether the distributed array is persistent (data written to the distributed array during one parallel dispatch is accessible by work items in a subsequent dispatch) or an indication whether the distributed array is shared (meaning that nested kernels may access the distributed array). A segment in the distributed array is allocated for use and is bound to a physical memory region. The segment is used by a workgroup (including one or more work items) dispatched as part of a data parallel kernel, and may be deallocated after it has been used.

A distributed array provides an abstraction through a uniform interface in terms of reads and writes, and can guarantee coherency. Accesses to memory are partitioned, such that how the user programs the memory access is how the memory access is compiled down to the machine. The properties that the programmer provides to the memory determines which physical memory it gets mapped to. The programmer does not have to specify (as under the OpenCL model) whether the memory is global, local, or private. The implementation of the distributed array maps to these different memory types because it is optimized to the hardware that is present and to where a work item is dispatched.

With the distributed array, memory may be defined to be persistent, such that it is loaded into local regions and can be stored back out again to more permanent storage if needed. The distributed array may be made persistent if the next workgroup needs this same data; for example, if the output of one workgroup is the input to the next workgroup. Workgroups may be scheduled to run on the same core, so that the workgroups can access the memory and eliminate the copy in/copy out overhead for later workgroups.

The distributed array is described in terms of segments, wherein the distributed array is a representation of a global set of all local memory regions. When bound to a parallel kernel launch, each segment of the distributed array can be accessed from one defined subgroup of the overall parallel launch, including a subset of individual work items. In the described embodiment, the subset would be the parallel workgroup within the overall parallel dispatch. Access from outside that subgroup may or may not be possible depending on defined behavior. The segment may be allocated at run time, or may be persistent due to a previous execution. If a segment is allocated, that segment may be explicitly passed into another launch, so a particular block of data can be identified and passed to a particular consuming task.

With the distributed array, it is possible to perform a depth-first optimization, in which all consecutive work that relies on one particular block of data is run before moving on to the next block of data. The distributed array is used instead of the current OpenCL-style memory model, with a large data parallel operation that writes a large amount of data to memory, reads a large amount of data back in, and so on. The depth-first optimization changes the order of execution based on the data dependencies, rather than the original data parallel construction, and allows for a more flexible execution pattern.
 
Last edited by a moderator:
Last edited by a moderator:
No, what it means for "lower resolution render targets" can be like a mirror reflection, environment maps, shadow maps, tone maps, etc, which are all intermediary outputs in a multi-pass rendering process. You are drawing a conclusion on something that's completely unrelated.

Or it can mean just that, a lower resolution.
 
Or it can mean just that, a lower resolution.

You asked a question on a subject that you don't understand, I gave you an answer on it, you didn't like my answer and wanted to believe whatever that you believe, then why bother asking at the first place?
 
It can mean just lower resolutions overall as well as lower-than-1080p individual component resolutions. "Render targets" can be 'render targets' as per rendering terminology, or 'targets for our game to render at'.
 
You asked a question on a subject that you don't understand, I gave you an answer on it, you didn't like my answer and wanted to believe whatever that you believe, then why bother asking at the first place?

Because as Shifty has pointed out, it can mean just that as well. The article articulated as dealing with overall resolution in this case...
 
I seem to remember reading that the final render target should be stored in DDR3, but depending on which intermediaries are put into ESRAM, I suppose that could impact the final target as well.
 
I see a lot of people on GAF "blame" the 360's EDRAM for some sub 720P games. Yet when you look at the big picture, AFAIK 360 had less sub 720 games, or often had slightly higher resolution multiplats, than PS3 which did not feature any EDRAM. So it's hard to exactly conclude the EDRAM was holding resolution back, in fact the evidence suggested the opposite.

So it's a complex issue at best.
 
I seem to remember reading that the final render target should be stored in DDR3, but depending on which intermediaries are put into ESRAM, I suppose that could impact the final target as well.

I think bkilian mentioned the final frame buffer to be written to DRAM and Gipsel mentioned that the frontbuffer must reside in DRAM. I was curious about those comments as well.

I have to imagine however that using the ESRAM to do lots of frame buffering stuff if not the final frame buffer must be something that MS would have thought valuable as it is a way to ease folks from the 360 way of doing things into the xb1 way AND since they profiled the 360 to hell and back before designing the xb1 I would think that there must be some mode or set of drivers that helps move things along in that way. A better performing 360 edram like system would seem to an obvious path to optimize for.

That is the thing. Drivers are important in general but I wonder if the drivers for the xb1 are even more important as they are virtual and performance inside the gaming virtual machine may be impacted more than in the case of other consoles. Just a passing thought.
 
Status
Not open for further replies.
Back
Top