XBox 360 Emulators for the PC

If previous consoles emulators are anything to go by, the custom GPU and the ability to use it however desired, and any fancy CPU/GPU interaction will be huge hurdles as well. I wonder what this will look like in 10 years....

360 emulation isn't being explored by MS at all is it?

That BUILD video seemed to indicate that it was possible & coming. Wait for the Q&A at the end. I'll believe it when I see it though.

Tommy McClain
 
If previous consoles emulators are anything to go by, the custom GPU and the ability to use it however desired, and any fancy CPU/GPU interaction will be huge hurdles as well. I wonder what this will look like in 10 years....

360 emulation isn't being explored by MS at all is it?
It is! What's more, it's today's news...

Microsoft are planning on creating a Xbox 360 emulator themselves.

http://www.totalxbox.com/74592/repo...nned-for-xbox-one-claims-microsoft-executive/

Knowing that there is a Xbox emulator on the Xbox 360 -couldn't they just port it, btw?-, they could pull it off. I wonder why though. EDRAM bandwidth is like 256 GB/s, while the eSRAM has like 204GB/s of bandwidth at its peak.
 
Last edited by a moderator:
It is! What's more, it's today's news...

Microsoft are planning on creating a Xbox 360 emulator themselves.

http://www.totalxbox.com/74592/repo...nned-for-xbox-one-claims-microsoft-executive/

Knowing that there is a Xbox emulator on the Xbox 360 -couldn't they just port it, btw?-, they could pull it off. I wonder why though. EDRAM bandwidth is like 256 GB/s, while the eSRAM has like 204GB/s of bandwidth at its peak.

Couldn't they just port the XBox emulator from XBox 360, to accomplish what exactly? XBox and XBox 360 emulation would share very little in common.

All they're saying now is they're exploring it, not that they've actually developed a viable solution. I'm not going to say it's impossible, since I would have never anticipated the level of emulation XBox 360 achieves. But I'm not going to accept it as a given either. MS at least knows it's hard.

To some extent, you can sort of make this stuff work on a per-game or group of game basis with enough HLE and special hacks, but that will turn into a nightmare of effort very quickly.

But those peak eDRAM vs eSRAM numbers really have little to do with anything. For one thing, the XBox 360 numbers only even make sense in the context of 4x MSAA which isn't even used that much. And there's a variety of other ways in which XBox 360's GPU wastes bandwidth compared to what's in XB1.
 
If DX12 is very similar or even identical across PC and XB1 then wouldn't that have the potential to make XB1 emulation on the PC very easy? Perhaps even easier than the 360? Especially when you consider the win8 basis of it's OS, x86 CPU and the GCN1.1 GPU.

ESRAM might complicate it.
 
Xenia will likely be close to full compatibility before the XB1 gets an emulator;
an audience member asked if Microsoft had any plans to introduce Xbox 360 emulation to the new console.
Savage responded in the affirmative, although it still may be quite a while to go before we see it, saying "There are, but we're not done thinking them through yet, unfortunately. It turns out to be hard to emulate the PowerPC stuff on the X86 stuff. So there's nothing to announce, but I would love to see it myself."
 
It's probably really, really hard to emulate a 3.2GHz PowerPC using low-power 1.6GHz x86 cores, but the Xenia can be used with (or even require) >4 GHz cores with an IPC that is a lot higher than the Jaguar (Sandybridge and up).


Microsoft may be able to develop some kind of interpreter+compiler that takes the source code for the x360 version and automatically (with very little debugging?) adapts and compiles it to the xbone for the publishers/developers to use.
But a real-time emulator done on the Jaguars? I doubt it.
 
A lot of how feasible this is depends on how much OS abstraction was present in XB360 games. In a normal OS you run in user mode and don't directly access hardware through the address space (anything that can cause implicit state changes/side effects beyond changing memory). If this is the case emulation overhead is lowered, especially if it's cumbersome to fully utilize the host's MMU without interfering with its underlying OS.

Also, it helps if self-modifying code requires communication with the OS, if such a thing is allowed at all (and Xenia's author claims it isn't..)

Consoles often allow pretty low level access even where the programs run under a real OS, so this can be a lot to ask for.

I still think there'll be a lot of headache going from XBox 360's GPU to a typical 3D API or even Mantle.
 
It's probably really, really hard to emulate a 3.2GHz PowerPC using low-power 1.6GHz x86 cores, but the Xenia can be used with (or even require) >4 GHz cores with an IPC that is a lot higher than the Jaguar (Sandybridge and up).


Microsoft may be able to develop some kind of interpreter+compiler that takes the source code for the x360 version and automatically (with very little debugging?) adapts and compiles it to the xbone for the publishers/developers to use.
But a real-time emulator done on the Jaguars? I doubt it.
I think it was bkilian :smile2: who said that the Xbox 360 CPU had a very low IPC, like a totally bad 20% :oops:.

This means that it ran at 640MHz instead of 3,2GHz, if I understand it correctly.

I hope they can pull it off. This way they could offer lots and lots of Games with Gold :) for Xbox One users without having to give recent Xbox One games for free, which is unfair for developers.

Xbox 360 has tons of games to choose from...
 
I think it was bkilian :smile2: who said that the Xbox 360 CPU had a very low IPC, like a totally bad 20% :oops:.

This means that it ran at 640MHz instead of 3,2GHz, if I understand it correctly.
IPC in that context is a rough overall average across the range of workloads Xenon ran.
The chip still ran at 3.2 GHz, but in many workloads it spent many cycles stalled or only issuing one instruction instead of the maximum of 2.

That is an average, however. There are likely very many places where it had even lower IPC, but there would also be certain spots where low-level optimizations could get it above.
Emulation of native instructions without any kind of recompilation or translation can drop performance by a factor of 10 or more, so Xenon does not suck enough to make Jaguar a win.

On top of that, IPC does not apply to code that lacks the instruction-level parallelism to allow more than one instruction per clock.
There is a subset of workloads that are highly sequential that raw clock speed can benefit, and there Jaguar falls seriously behind.
Though Xenon does not easily reach peak vector throughput, some code that does decently enough will still pose problems for Jaguar, because raw vector throughput is one area where Jaguar is not notably better.
 
For low ILP code you can't just look at clock speed but latency of instructions.. here Xenon isn't so good across the board since even simple ALU instructions have a latency of two cycles, and load-to-use latency on L1 hits is 5 cycles.

Such a low average IPC of 0.2 is hard to conceive for a superscalar processor but at least on paper it had some really serious glass jaws.
 
For low ILP code you can't just look at clock speed but latency of instructions.. here Xenon isn't so good across the board since even simple ALU instructions have a latency of two cycles, and load-to-use latency on L1 hits is 5 cycles.
Given the order of magnitude or more hit for pure software emulation, that still leaves Jaguar as insufficient.
The straightline speed just isn't there, and code that is contorted to best fit the long latencies of Xenon multiplies the per-instruction cost of the emulator.
 
I'd actually be surprised if they couldn't get the 6 physical jag cores to match int performance of 3 smp xenon cores in emulated scenarios. The real trick would be somehow getting the avx vector code emulated via openCL on the gpu side. I don't know if they could extract them amount of parallelization required make up the clock speed disparity and support its various use cases.
 
Given the order of magnitude or more hit for pure software emulation, that still leaves Jaguar as insufficient.
The straightline speed just isn't there, and code that is contorted to best fit the long latencies of Xenon multiplies the per-instruction cost of the emulator.

If user mode emulation applies in the way I described, and if most time is spent in user mode code or the OS code is HLE'd, the overhead for the CPU emulation part will be far under an under of magnitude. Even system mode emulation probably won't suffer that much if done decently. I doubt they'd need to count cycles, context switch too frequently, or explicitly check for interrupts.
 
IPC in that context is a rough overall average across the range of workloads Xenon ran.
The chip still ran at 3.2 GHz, but in many workloads it spent many cycles stalled or only issuing one instruction instead of the maximum of 2.

That is an average, however. There are likely very many places where it had even lower IPC, but there would also be certain spots where low-level optimizations could get it above.
Emulation of native instructions without any kind of recompilation or translation can drop performance by a factor of 10 or more, so Xenon does not suck enough to make Jaguar a win.

On top of that, IPC does not apply to code that lacks the instruction-level parallelism to allow more than one instruction per clock.
There is a subset of workloads that are highly sequential that raw clock speed can benefit, and there Jaguar falls seriously behind.
Though Xenon does not easily reach peak vector throughput, some code that does decently enough will still pose problems for Jaguar, because raw vector throughput is one area where Jaguar is not notably better.
Oh well, I thought that the CPU's bandwidth on Xbox One would be enough for that, because it's more than the entire Xbox 360 bandwidth -save the EDRAM-, and it's unidirectional, not bidirectional -22GB/s on X360, which means 11GB/s write, 11GB/s read).

Does the fact that the CPU is Out-of-Order helps with the GHz performance difference on paper?

In addition, just like Rockster and shredenvain, I wonder if the extra grunt of the GPU couldn't process some additional tasks where the CPU might have a hard time equalling the original processing tasks.

What Exophase comments when he says that such a low IPC is hard to conceive makes me feel curious about how they didn't find it out before the console was released.
 
I think it was bkilian :smile2: who said that the Xbox 360 CPU had a very low IPC, like a totally bad 20% :oops:.

This means that it ran at 640MHz instead of 3,2GHz, if I understand it correctly.

the overall IPC may not be the best, but the real problems occur when you've get commands the PowerPC was really really fast on.
 
Oh well, I thought that the CPU's bandwidth on Xbox One would be enough for that, because it's more than the entire Xbox 360 bandwidth -save the EDRAM-, and it's unidirectional, not bidirectional -22GB/s on X360, which means 11GB/s write, 11GB/s read).
IPC is not a measure of bandwidth. If a system is primarily bandwidth constrained, that restriction would be considered separately, even if it is technically true that the chip will stall if it has a lot of memory traffic pending.

Does the fact that the CPU is Out-of-Order helps with the GHz performance difference on paper?
The common case is that it does--until it doesn't. There has to additional non-dependent instructions available for the chip to execute out of order. This isn't always the case.
Retranslation, recompilation, or refactoring has to happen in cases where the CPU's finite reordering capabilities do not help, or the problem itself doesn't give enough extra work to reorder.

What Exophase comments when he says that such a low IPC is hard to conceive makes me feel curious about how they didn't find it out before the console was released.
IPC in this case is a measured value, you don't find out what it is until it exists to be measured to get a true reading.
It was not, however, that much of a surprise.
IBM, at least would have seen it coming to a reasonable degree of accuracy.
 
There are plenty of cases where processors didn't do so well because software ended up not looking like the CPU designers expected it to. This was true to some extent with Pentium 4 and moreso with Itanium.

Sony had already been gung-ho on a lot of raw vector processing power over more general purpose CPU power, this is apparent in PS2 as well. I guess they thought most game code amounted to or could amount to this.

MS then poached Sony's main core, which was probably the most realistic option IBM could give them at the time.
 
Microsoft at the very least wanted an OoO core, but they didn't have the time to design and validate it.
A cheap, fast to market, and high-clock processor with high vector peak is what IBM was tasked to create. While IBM was stuck on a later-reversed design direction that would lead to POWER6, the many flaws and clunky memory pipeline for the PPE and Xenon were things IBM had decades of experience validating and modeling.
IBM wasn't surprised by this outcome.

In the case of Cell, IBM didn't initially want that design, either.
 
There are plenty of cases where processors didn't do so well because software ended up not looking like the CPU designers expected it to. This was true to some extent with Pentium 4 and moreso with Itanium.

Sony had already been gung-ho on a lot of raw vector processing power over more general purpose CPU power, this is apparent in PS2 as well. I guess they thought most game code amounted to or could amount to this.

MS then poached Sony's main core, which was probably the most realistic option IBM could give them at the time.

Itanium is an odd one to bring up in this context as it was a design as much driven by business decisions as anything else. Intel was never hot on extending x86 to 64 bits as they wanted to move commodity servers to Itanium and restore their hegemony in commodity servers. AMD extended x86 and commodity servers users refused to pay the massive costs in refactoring their software (or completely redesigning in most cases) for VLIW and Itanium. Intel caved, adopted AMD64 (yay cross-licensing agreements) and competed the right way by making a better x86-64 architecture.

At that point Itanium was an expensive, bespoke architecture that Intel wasn't willing to spend billions on anymore as most customers for expensive, bespoke server architectures already had a huge investment in their rivals tech and little appetite for the s/w engineering required to move. Intel's obvious disappointment with the Itanium project led to rumours of its early demise which either became self-fulfilling when customers refused to invest in a 'risky' platform or just true depending on how you look at it.

OT I thought the Power cores in both designs were fairly similar bar the differing approaches to vector units (1core + SPUs versus 3 cores w/AltiVec)?
 
Back
Top