Full BC on PS3 (proven via Jailbreak) *spawn

Great response. For some reason it didn't occur to me that the pointers could just be redirected and you'd hold a translation table.

A first thought however The setup will still suffer some loss regardless. Data is likely to be setup serially for pre fetch. With this translation your likely going to be jumping all over memory. There still will be heavy penalties I think.

I think any inefficiencies introduced there will more than be made up for by the inherent efficiency of having a single memory pool where all memory accesses have the ability to leverage GDDR5-class bandwidth.
 
I think any inefficiencies introduced there will more than be made up for by the inherent efficiency of having a single memory pool where all memory accesses have the ability to leverage GDDR5-class bandwidth.
With the caveat that normally Xbox one operates with 2 pools of memory. The GPU operating with esram and DDR3. And the CPU operating with DDR3.

If memory contention is an issue on Xbox one and PS4, adding in this additional pool (whose behaviour is to operating without such limitations and more importantly operate on its own) should/could exasperate that issue greatly.

While I agree that your solution is likely closer to optimal, in moments in which esram bandwidth is hitting 140GB/s-170GB/s which is a combination of simultaneous read and writes I see the performance dropping significantly. Everything they I've learned from this board has indicated to me that DDR bandwidth is chopped heavily in scenarios where it's read followed by write followed by read. Or read and write at the same time.

But 2x-3x more GPU performance could very well make up for it.
 
Excuse my early sarcasm. Now I understand why the cpu/gpu power delta was brought up. It wasn't as clear to me before.
Also, do you guys think its possible to have some custom hardware doing some of that translation work and of DMA's work, to free up cpu from those tasks? That still would take less die area than esram I believe.
 
Excuse my early sarcasm. Now I understand why the cpu/gpu power delta was brought up. It wasn't as clear to ms before.
Also, do you guys think its possible to have some custom hardware doing some of that translation work and of DMA's work, to free up cpu from those tasks? That still would take less die area than esram I believe.
There's definitely a lot of ways to approach it, and certainly a hardware assisted emulation could help. I do believe something could be created like you've mentioned but that's beyond my knowledge entirely.

I like Mcorbos idea in just virtualizing the memory stack. His idea is the most straight forward unless developers are doing some crazy hacks in esram in which I imagine crashing will result. His idea will be based around having enough performance to overcome the limitations in which I suppose in 1 or 2 months we will know if there is enough.
 
I wonder how much simply "moving" data (copy then "destroying" the original) is used relative to other other operations involving the two memory pools.

For example, simply moving data and not performing any kind of operation on it means a read, a write into the new location, then a read to do something with it, then another write. You could halve your BW consumed in such a situation by simply reading in one pool, performing your operation, then writing to the other. I would assume that this is what developers would aim for when working out how to structure operations for optimal performance. Move with decompress appeared to be an interesting feature of the X1, but that would require a decompress to work so you're going to be reading and writing again whatever memory arrangement you have.
 
I wonder how much simply "moving" data (copy then "destroying" the original) is used relative to other other operations involving the two memory pools.

For example, simply moving data and not performing any kind of operation on it means a read, a write into the new location, then a read to do something with it, then another write. You could halve your BW consumed in such a situation by simply reading in one pool, performing your operation, then writing to the other. I would assume that this is what developers would aim for when working out how to structure operations for optimal performance. Move with decompress appeared to be an interesting feature of the X1, but that would require a decompress to work so you're going to be reading and writing again whatever memory arrangement you have.

The only time you would want to move data out of ESRAM without performing an operation on it is when you needed to move something else into ESRAM to perform an operation on it and needed to clear space. The limitation of only having 32MB at a time that can run at this higher bandwidth is another reason that I don't think that emulating it on a system with similar aggregate bandwidth that's available to all of its data would be a major problem.
 
The only time you would want to move data out of ESRAM without performing an operation on it is when you needed to move something else into ESRAM to perform an operation on it and needed to clear space. The limitation of only having 32MB at a time that can run at this higher bandwidth is another reason that I don't think that emulating it on a system with similar aggregate bandwidth that's available to all of its data would be a major problem.

Not necessarily ... if your motion blur or image reconstruction required you to keep previous frame colour, depth, motion vector buffers [in dram] you'd be moving data out and not copying data in to replace it (you'd be generating the data filling the esram not copying it in). You could have traffic that all one way from esram -> dram. I can actually image this being a pretty common scenario.

But my general preponderance was about how much of the work of emulating the esram could be bypassed using the virtualisation method you proposed (which I agree is clever and simple). My guess is it could save on some of the work of the move engines but not all (e.g. duplicating or decompressing), but that the move engines are responsible for only a small portion of the data flowing between the two pools (with most being read->process->write).
 
Whoa, great exchange of info, good brainstorming so far

The title should be changed again, or this last page deserves its own spawn lol
"How Ms will tackle back-compat with future xboxes", because we know they´ll manage to do it one way or another.

Being the api of the Xbox One, a sibling of the pc counterpart, more so future DX12 games, and even some first titles who relied on barebones DX11. At worst they just would need to recompile the memory management part of their engines, right? I mean, most of Xbox games use cross-platform engines, who run on pc and are not dependant in the kind of latencies that brings the esram, right?
 
I told the doubters it could be done, but its better to show them or just have MS do it. Then its much easier to understand
 
Can special sauce be emulated through software, or will game food need to be recompiled?
 
That sounds a little too reasonable :p
But yeah; I never once viewed Xbox One as an engineering miracle; rather an off the shelve, memory handicapped AMD APU. With "display planes"

With PS4 and emulation; there were some modifications to increase compute capacity. I'm guessing those modifications need to be in the PS4Neo because those are actually there to increase performance. Like if MS added a 64MB 512GB/sec ESRAM pool to a 4/8GB GDDR5 Xbox One in 2013; then they would most certainly need it in the follow up hardware.
 
Last edited:
That sounds a little too reasonable :p
But yeah; I never once viewed Xbox One as an engineering miracle; rather an off the shelve, memory handicapped AMD APU. With "display planes"

With PS4 and emulation; there were some modifications to increase compute capacity. I'm guessing those modifications need to be in the PS4Neo because those are actually there to increase performance. Like if MS added a 64MB 512GB/sec ESRAM pool to a 4/8GB GDDR5 Xbox One in 2013; then they would most certainly need it in the follow up hardware.

Actually, the ESRAM was a clever solution as was SHAPE. Unfortunately for MS, they were clever solutions to problems that ended up not being problems. 8 GB of GDDR5 *was* possible and nobody ended up caring whether voice control of a console worked especially well.
 
SHAPE is a brilliant processor that just isn't fucking used for what it's meant for, yeah. Makes me sad.
 
SHAPE is a brilliant processor that just isn't fucking used for what it's meant for, yeah. Makes me sad.
I can't say 343i or any other Xbox exclusive has impressed upon me the advantage of such silicon. Given the audio direction of 343i, I can't say I'm surprised, but what of Remedy?

Was it worth it?

If Kinect was that reason then... I have no more further to comment.
 
SHAPE is mostly for Kinect. For games it can resample audio and decode some formats. Dunno about resampling, but PS4 can decode some formats in hardware too.
 
Ignoring all the OT stuff, is there an actual list of PS2 games that work on the PS3 emulator? Also, how does their performance compare to the real thing?
 
Back
Top