esram astrophysics spin-off

3dilettante · Sep 11, 2013

That article also states the eSRAM is running at 2x the GPU clock.
That is another hypothetical way to higher bandwidth, although the article doesn't explain the min/peak asymmetry and some of the latest statements about the architecture.

taisui · Sep 11, 2013

3dilettante said:
That article also states the eSRAM is running at 2x the GPU clock.
That is another hypothetical way to higher bandwidth, although the article doesn't explain the min/peak asymmetry and some of the latest statements about the architecture.

read my post #212

Shifty Geezer · Sep 11, 2013

The PR statements and GAF based dialogue between Penello and everyone else is worthless to this discussion unless he happens to explain the eSRAM BW in more detail. If he doesn't, that topic has nothing to do with this thread.

3dilettante · Sep 11, 2013

taisui said:
read my post #212

Unfortunately, I don't think that is consistent with the article's math, either.
It's written like it assumes the final post-bump bandwidth is what was available prior to the upclock, and that the upclock might have raised it further.

edit:
However, it does claim that the eSRAM is heavily banked internally.

taisui · Sep 11, 2013

3dilettante said:
Unfortunately, I don't think that is consistent with the article's math, either.
It's written like it assumes the final post-bump bandwidth is what was available prior to the upclock, and that the upclock might have raised it further.

edit:
However, it does claim that the eSRAM is heavily banked internally.

the SA article (assume this is what you are referring to) is so badly written with tons of error. I am doubtful that the writer is all that technical.

edit:
I'd happy to exam my theory if you can provide me what you found from my math that has discrepancies.

3dilettante · Sep 12, 2013

taisui said:
the SA article (assume this is what you are referring to) is so badly written with tons of error. I am doubtful that the writer is all that technical.

edit:
I'd happy to exam my theory if you can provide me what you found from my math that has discrepancies.

I was addressing the Semiaccurate article. I thought you were referencing the earlier post to explain what the article wrote, but the article's math didn't match yours.

taisui · Sep 12, 2013

3dilettante said:
I was addressing the Semiaccurate article. I thought you were referencing the earlier post to explain what the article wrote, but the article's math didn't match yours.

Ah, I see. No, what I was trying to work out the math based on the VGLeaks numbers, upclock adjustment, and the hotchips numbers.

AzBat · Sep 12, 2013

Seems that the 204 number is wrong...

Albert Penello said:
Freki said:

So why is the bidirectional bandwith of your eSRAM 204GB/s although your one-directional bw is 109GB/s - shouldn't it be 218GB/s?

Click to expand...

Yes, it should be. And I was quickly corrected (both on the forum and from people at the office) for writing the wrong number.

...

I still stand by what I stated (except for the aforementioned 204/218). In fact, some really interesting threads were going back and forth giving me even more excruciating detail behind those numbers based on the questions people asked.

Someone did follow-up with a question if the HotChips slides were wrong too since they used the 204 number. He hasn't replied to that. Seems like he's getting all the answers to people's questions but it will take a week or two to get them.

http://www.neogaf.com/forum/showpost.php?p=81372357&postcount=632

Tommy McClain

Shifty Geezer · Sep 13, 2013

With everyone talking about OT system bandwidth instead of how the eSRAM bus manages its flexi-speed, I'll close this thread until I have time to clean up and spawn.

XB1's Bandwidth discussion here.

dobwal · Sep 16, 2013

May be this AMD patent is applicable to the eSRAM.

Abstracting scratch pad memories as distributed arrays
http://www.google.com/patents/US20130212350

FIG. 4 is a block diagram of a memory model showing how the distributed array may map to physical memory; and

Many programming models (for example, in graphics processing units (GPUs), heterogeneous computing systems, or embedded architectures) have to control access to multiple levels of a memory hierarchy. In a memory hierarchy, certain memories are close to where the operations happen (e.g., an arithmetic logic unit), while other memories are located farther away. These different memories have different properties, including latency and coherency. With latency, the farther away the memory is located from where the operation happens, the longer the latency. With coherency, when a memory is located closer to the chip, it may not be able to see some reads and writes in other parts of the chip. This has lead to complicated programming situations dealing with addresses in multiple memory spaces.

Abstracting multiple memory spaces may be done to improve portability (for example, when such disjoint memory spaces do not exist on a target architecture), improve compiler optimizations, support dependence tracking, and provide automated data persistence.

A distributed array is declared in terms of segments, a number of which will be allocated at run time to match the number of executing groups. Each executing group has access to its own segment. By using segments, the programmer may focus on accessing the data, rather than where the data is mapped to.

In a computing system, memory may be managed by using a distributed array, which is a global set of local memory regions. A segment in the distributed array is allocated and is bound to a physical memory region. The segment is used by a workgroup in a dispatched data parallel kernel, wherein a workgroup includes one or more work items. When the distributed array is declared, parameters of the distributed array may be defined. The parameters may include an indication whether the distributed array is persistent (data written to the distributed array during one parallel dispatch is accessible by work items in a subsequent dispatch) or an indication whether the distributed array is shared (nested kernels may access the distributed array). The segment may be deallocated after it has been used.

DETAILED DESCRIPTION
In a heterogeneous computing system, memory may be managed by using a distributed array, which is a global set of local memory regions. To use the distributed array, it is first declared along with optional parameters. The parameters may include an indication whether the distributed array is persistent (data written to the distributed array during one parallel dispatch is accessible by work items in a subsequent dispatch) or an indication whether the distributed array is shared (meaning that nested kernels may access the distributed array). A segment in the distributed array is allocated for use and is bound to a physical memory region. The segment is used by a workgroup (including one or more work items) dispatched as part of a data parallel kernel, and may be deallocated after it has been used.

A distributed array provides an abstraction through a uniform interface in terms of reads and writes, and can guarantee coherency. Accesses to memory are partitioned, such that how the user programs the memory access is how the memory access is compiled down to the machine. The properties that the programmer provides to the memory determines which physical memory it gets mapped to. The programmer does not have to specify (as under the OpenCL model) whether the memory is global, local, or private. The implementation of the distributed array maps to these different memory types because it is optimized to the hardware that is present and to where a work item is dispatched.

With the distributed array, memory may be defined to be persistent, such that it is loaded into local regions and can be stored back out again to more permanent storage if needed. The distributed array may be made persistent if the next workgroup needs this same data; for example, if the output of one workgroup is the input to the next workgroup. Workgroups may be scheduled to run on the same core, so that the workgroups can access the memory and eliminate the copy in/copy out overhead for later workgroups.

The distributed array is described in terms of segments, wherein the distributed array is a representation of a global set of all local memory regions. When bound to a parallel kernel launch, each segment of the distributed array can be accessed from one defined subgroup of the overall parallel launch, including a subset of individual work items. In the described embodiment, the subset would be the parallel workgroup within the overall parallel dispatch. Access from outside that subgroup may or may not be possible depending on defined behavior. The segment may be allocated at run time, or may be persistent due to a previous execution. If a segment is allocated, that segment may be explicitly passed into another launch, so a particular block of data can be identified and passed to a particular consuming task.

With the distributed array, it is possible to perform a depth-first optimization, in which all consecutive work that relies on one particular block of data is run before moving on to the next block of data. The distributed array is used instead of the current OpenCL-style memory model, with a large data parallel operation that writes a large amount of data to memory, reads a large amount of data back in, and so on. The depth-first optimization changes the order of execution based on the data dependencies, rather than the original data parallel construction, and allows for a more flexible execution pattern.

Shortbread · Sep 17, 2013

http://www.eurogamer.net/articles/digitalfoundry-ryse-runs-at-900p

Developer sources have also suggested that the 32MB of ESRAM - the fast scratchpad memory for high-speed graphics processing - may favour lower resolution render targets. This is a topic we hope to return to soon with some hard data from on-the-record sources.

So what does this mean for future XB1 1080p titles? Don't expect to many if the ESRAM is involved?

taisui · Sep 17, 2013

Shortbread said:
http://www.eurogamer.net/articles/digitalfoundry-ryse-runs-at-900p

So what does this mean for future XB1 1080p titles? Don't expect to many if the ESRAM is involved?

No, what it means for "lower resolution render targets" can be like a mirror reflection, environment maps, shadow maps, tone maps, etc, which are all intermediary outputs in a multi-pass rendering process. You are drawing a conclusion on something that's completely unrelated.

rokkerkory · Sep 17, 2013

Shortbread said:
http://www.eurogamer.net/articles/digitalfoundry-ryse-runs-at-900p

So what does this mean for future XB1 1080p titles? Don't expect to many if the ESRAM is involved?

We probably don't know until 3rd gen games what the implications are, whether good or bad for ESRAM. There has got to be a reason why MS went with it. I don't they would knowingly choose a path that limits resolution.

Shortbread · Sep 17, 2013

taisui said:
No, what it means for "lower resolution render targets" can be like a mirror reflection, environment maps, shadow maps, tone maps, etc, which are all intermediary outputs in a multi-pass rendering process. You are drawing a conclusion on something that's completely unrelated.

Or it can mean just that, a lower resolution.

taisui · Sep 17, 2013

Shortbread said:
Or it can mean just that, a lower resolution.

You asked a question on a subject that you don't understand, I gave you an answer on it, you didn't like my answer and wanted to believe whatever that you believe, then why bother asking at the first place?

Shifty Geezer · Sep 17, 2013

It can mean just lower resolutions overall as well as lower-than-1080p individual component resolutions. "Render targets" can be 'render targets' as per rendering terminology, or 'targets for our game to render at'.

Shortbread · Sep 17, 2013

taisui said:
You asked a question on a subject that you don't understand, I gave you an answer on it, you didn't like my answer and wanted to believe whatever that you believe, then why bother asking at the first place?

Because as Shifty has pointed out, it can mean just that as well. The article articulated as dealing with overall resolution in this case...

Scott_Arm · Sep 17, 2013

I seem to remember reading that the final render target should be stored in DDR3, but depending on which intermediaries are put into ESRAM, I suppose that could impact the final target as well.

Rangers · Sep 17, 2013

I see a lot of people on GAF "blame" the 360's EDRAM for some sub 720P games. Yet when you look at the big picture, AFAIK 360 had less sub 720 games, or often had slightly higher resolution multiplats, than PS3 which did not feature any EDRAM. So it's hard to exactly conclude the EDRAM was holding resolution back, in fact the evidence suggested the opposite.

So it's a complex issue at best.

zupallinere · Sep 17, 2013

Scott_Arm said:
I seem to remember reading that the final render target should be stored in DDR3, but depending on which intermediaries are put into ESRAM, I suppose that could impact the final target as well.

I think bkilian mentioned the final frame buffer to be written to DRAM and Gipsel mentioned that the frontbuffer must reside in DRAM. I was curious about those comments as well.

I have to imagine however that using the ESRAM to do lots of frame buffering stuff if not the final frame buffer must be something that MS would have thought valuable as it is a way to ease folks from the 360 way of doing things into the xb1 way AND since they profiled the 360 to hell and back before designing the xb1 I would think that there must be some mode or set of drivers that helps move things along in that way. A better performing 360 edram like system would seem to an obvious path to optimize for.

That is the thing. Drivers are important in general but I wonder if the drivers for the xb1 are even more important as they are virtual and performance inside the gaming virtual machine may be impacted more than in the case of other consoles. Just a passing thought.

esram astrophysics spin-off

3dilettante

taisui

Shifty Geezer

uber-Troll!

3dilettante

taisui

3dilettante

taisui

AzBat

Agent of the Bat

Shifty Geezer

uber-Troll!

dobwal

Shortbread

Island Hopper

taisui

rokkerkory

Shortbread

Island Hopper

taisui

Shifty Geezer

uber-Troll!

Shortbread

Island Hopper

Scott_Arm

Rangers

zupallinere

Similar threads

esram astrophysics *spin-off*

uber-Troll!

Agent of the Bat

uber-Troll!

Island Hopper

Island Hopper

uber-Troll!

Island Hopper

Similar threads

esram astrophysics spin-off