esram astrophysics *spin-off*

Status
Not open for further replies.
Exactly, my point is that looking at the context of how the ESRAM is used as a scratchpad memory, it related more to it being a software cache, like intermediary targets. Being at 32MB it just wouldn't make sense that this is somehow restricting the native resolution.

360 launch titles such as PDZ was 640p, but Halo 4 was able to do way better graphics and at 720p at the end of the life cycle.

Forza 5 does 1080p at silky smooth 60fps, so what does this say about the ESRAM?
 
Forza 5 is likely using a forward renderer, so the framebuffer requirements would be lower.
 
That's based on the observation that there's practically no dynamic lighting in their environments or cars.

For the series on 360, their use of MSAA whilst hitting 60fps is enough of an indicator.
 
I think bkilian mentioned the final frame buffer to be written to DRAM and Gipsel mentioned that the frontbuffer must reside in DRAM. I was curious about those comments as well.

Unlike the 360, there is no technical reason that I'm aware of stating the front buffer must be in the DDR3 pool, as display plane data can come directly from ESRAM. I think it's just likely a waste of valuable esram space vs. the minimal memcopy bandwidth cost to evict it in most cases.
 
I see a lot of people on GAF "blame" the 360's EDRAM for some sub 720P games. Yet when you look at the big picture, AFAIK 360 had less sub 720 games, or often had slightly higher resolution multiplats, than PS3 which did not feature any EDRAM. So it's hard to exactly conclude the EDRAM was holding resolution back, in fact the evidence suggested the opposite.

So it's a complex issue at best.
I think the edram initially led to sub 720p games as some developers didn't want to tile with 2x MSAA. Once developers discovered most gamers didn't notice the lower resolution it caught on.
 
Unlike the 360, there is no technical reason that I'm aware of stating the front buffer must be in the DDR3 pool, as display plane data can come directly from ESRAM. I think it's just likely a waste of valuable esram space vs. the minimal memcopy bandwidth cost to evict it in most cases.

Ah yes I didn't mean to say the entire ESRAM would be used, merely a portion of it. Again it's merely because of it's a flexible piece of hi performance memory and using it in a way that developers are used to might be beneficial at least early on in the life of the console.
 
I see a lot of people on GAF "blame" the 360's EDRAM for some sub 720P games. Yet when you look at the big picture, AFAIK 360 had less sub 720 games, or often had slightly higher resolution multiplats, than PS3 which did not feature any EDRAM. So it's hard to exactly conclude the EDRAM was holding resolution back, in fact the evidence suggested the opposite.

So it's a complex issue at best.
The EDRAM wasn't big enough to hold a full 720p buffer with MSAA. It needed tiling. Thus sub HD buffers with MSAA or full 720p with no anti-aliasing or screen space anti-aliasing.
 
Forza 5 is likely using a forward renderer, so the framebuffer requirements would be lower.
Aren't people making castles in the sky? I mean, the games on the Xbox 360 had to be resolved in the EDRAM whether you liked it or not and the purpose of the eSRAM is not to hold the whole framebuffer but it is a fully independent memory, it doesn't need the DDR3 at all, the DDR3 doesn't need it either and the GPU has access to both.
 
Aren't people making castles in the sky?

:?: That doesn't change it having different requirements/fewer complications than MRT setups.

Saying that X-game can do 1080p60 easily bears little meaning for comparison to another game with a vastly different technical setup.
 
:?: That doesn't change it having different requirements/fewer complications than MRT setups.

Saying that X-game can do 1080p60 easily bears little meaning for comparison to another game with a vastly different technical setup.

Well, that kinda goes back to the original topic, where taking Ryse at 900p and try to make a generalization on ESRAM and 1080p, would be subjected to the same reasoning, don't you agree?
 
There's a concern because MRTs inherently use more memory, and there's currently no public knowledge if devs can or cannot split said MRTs between the two memory spaces that might alleviate that issue.

Folks only have their experience with 360, which did have certain penalties, but there's no particular reason to conclude Durango is subject to the same problems.

Clearly, there can be other reasons for choosing a particular resolution (i.e. performance profiling, upscaling factors, main memory). Obviously more pixels cost more! ;).

Until we have further clarification (perhaps Digital Foundry has that), there's room to question.
 
There's a concern because MRTs inherently use more memory, and there's currently no public knowledge if devs can or cannot split said MRTs between the two memory spaces that might alleviate that issue.
Of course they can. Why shouldn't they? It doesn't make much sense otherwise.
It should be even possible to have just a part (or parts) of a single render target in eSRAM and the other part(s) in DRAM, if the API allows for setting the mapping in a more fine grained manner (the hardware should be fine with it as each GPU memory access is supposed to go through a page table determining if the page resides in eSRAM or in DRAM).
That's coming from the vgleaks stuff, which is public knowledge.
 
Now that the full DF transcript of the interview with Microsoft's designers has been published, I figured I would tidy up some of the discussion earlier about what the eSRAM could be doing.

Per the designers, the eSRAM is divided into 4 8MB lanes, each with its own controller.
Internally, those chunks are subdivided into 8 modules.
The controller is able to issue accesses to different modules in a cycle, which allows for simultaneous reads and writes at the level of the lane if not the individual SRAM arrays themselves.
The description goes further to indicate that hitting the same areas repeatedly causes a loss of bandwidth, because the individual components themselves do not do multiple accesses.

The explanation for why peak bandwidth isn't twice that of the base bandwidth is that writes introduce bubbles in the access pipeline.
It seems that if there weren't concurrent read activity, the write bubbles can be hidden.
Some of my earlier examples as to how this can happen would need to be modified to take into account that there are 8 modules, and the eSRAM slightly favors reads over writes, which is not an assumption I made.
The behavior of the eSRAM bears some similarities to how the external memory controllers would work for a heavily banked DRAM device, and the point is made that they sit at the same level of the memory hierarchy, on the other side of a large crossbar.


My interpretation of the story they gave for the "doubled" bandwidth discovery was that the original minimum bandwidth number was decided upon and given as a design parameter for software development and planning before the eSRAM and its control logic was actually designed.
This isn't quite as early as the cocktail napkin stage, but it was well before the design was fleshed out. The description of the design that exists now shows that the eSRAM is structured to have separate paths for reads and writes, and it takes advantage of banked accesses to get simultaneous traffic.
 
Now that the full DF transcript of the interview with Microsoft's designers has been published, I figured I would tidy up some of the discussion earlier about what the eSRAM could be doing.

Per the designers, the eSRAM is divided into 4 8MB lanes, each with its own controller.
Internally, those chunks are subdivided into 8 modules.
The controller is able to issue accesses to different modules in a cycle, which allows for simultaneous reads and writes at the level of the lane if not the individual SRAM arrays themselves.
The description goes further to indicate that hitting the same areas repeatedly causes a loss of bandwidth, because the individual components themselves do not do multiple accesses.

The explanation for why peak bandwidth isn't twice that of the base bandwidth is that writes introduce bubbles in the access pipeline.
It seems that if there weren't concurrent read activity, the write bubbles can be hidden.
Some of my earlier examples as to how this can happen would need to be modified to take into account that there are 8 modules, and the eSRAM slightly favors reads over writes, which is not an assumption I made.
The behavior of the eSRAM bears some similarities to how the external memory controllers would work for a heavily banked DRAM device, and the point is made that they sit at the same level of the memory hierarchy, on the other side of a large crossbar.


My interpretation of the story they gave for the "doubled" bandwidth discovery was that the original minimum bandwidth number was decided upon and given as a design parameter for software development and planning before the eSRAM and its control logic was actually designed.
This isn't quite as early as the cocktail napkin stage, but it was well before the design was fleshed out. The description of the design that exists now shows that the eSRAM is structured to have separate paths for reads and writes, and it takes advantage of banked accesses to get simultaneous traffic.

So can you put this hypothesis into an equation so it can be evaluated?

And isn't minimum bandwidth 0?
 
And isn't minimum bandwidth 0?

This is more of a philosophical question than a scientific one. If a memory request is made in the woods where there are no clients, does it contribute to bandwidth?

More seriously, I am curious how absolute that minimum is. Assuming enough requests to allow one access per cycle, is there no access pattern that leads to less than peak?
 
How can a minimum be 100% BW efficiency and yet "real world" code be around 70% efficiency? Is real world less than minimum? Philosophical or not, it's a vexing question.
 
How can a minimum be 100% BW efficiency and yet "real world" code be around 70% efficiency? Is real world less than minimum? Philosophical or not, it's a vexing question.

Well, it's all a matter of how long a period of time you're looking at, no? You'd be reading at full bandwidth for a number of clock cycles, fractions of a second.
 
Status
Not open for further replies.
Back
Top