It's straightforward if you don't care about performance. Optimizing the code to perform as well as possible on the target platform is a very deep problem.
1) You're looking at native code performance. You'll be lucky to average 50% of that with translated code, and that's assuming some good generous properties of the code and emulator. The binary translator simply doesn't have access to the same level of structure and information that the compiler would have and it pays for it, especially when the original code was for an arch with twice as many registers. It also pays for having to manage some branch targets that aren't known statically.
Microsoft wrote the compiler that generated the X360 opcodes, they are probably the best at generating something AST alike from the opcodes and generating an optimized binary for a new target platform. That's what driver do with shader byte-code (which is optimized nowadays for a totally different "imaginary" processor). That's also what .Net does with the bytecode. That's what NVidia's "Denver" does also. That's why I'd call it 'state of the art' in the high end software compiler business.
I agree that some information is missing, but at the same time, that information would be of not that much help, because the data is layout already with that information in mind e.g. alignment, padding of members in structs, endianess etc.
2) That performance comparison was probably done using GCC, vs what production code would have used, which was IBM's compiler which got more mature in time.
if you're referring to the PPU vs x86 benchmarks, your assumptions are not correct.
e.g.
http://web.archive.org/web/20100531...kpatrol.ca/2006/11/playstation-3-performance/
(that's not the comparison I recall, just random 1min of google)
you have now the exact same situation, microsoft VC++ vs Microsoft VC++, I'd doubt the X360 compiler would be more advanced than the optimizers they use for the transcoders now.
3) The common 2x1.6GHz claim is very misleading, first of all you get full access to the CPU if only one thread is running. Second, it's like SMT - there's some contention but it's not nearly like halving performance, that would defeat the point.
there are several things to consider
a) it's an in-order CPU, on OoO both threads can try to fill units of the CPU if some memory fetches are stalling, while on an In-Order design a fetch will completely stall that one pipeline. Running Both pipelines will cause a better occupation on instruction side, but per SMT-thread it will cause more contention and friction e.g. on L1D side, which is actually critical for the stalls.
b) if a game does not utilize 6threads, it's likely there was no need to do so, thus I'd mightily assume it's not a critical code path. if it was critical, then it's likely spread to more cores for more throughput overall, but less throughput per core. That favors the real cores of XBOne
I have no hard numbers to back my assumption, tho.
Nonetheless, the performance per thread is nowhere close to 100% what it is when only one thread is running so there's definitely some reprieve for code that heavily multithreads the cores. BUT this is assuming that they don't rely on a high degree of synchronization for performance or correctness, one that the emulator would probably not be able to provide while running the threads on separate cores (and if they have to constantly switch threads on the same core to get the same effect performance will tank). That's kind of the thing here, with so few games supported right now we don't know what kind of potential compatibility it has.
I agree. and yes, I'm wildly guessing here. I don't claim it's the way I say, just from my experience it's what I'd assume most likely. MS is a great [edit: not OS, I meant:] COMPILER company and software emulating opcodes on runtimes or even 1:1 translation wouldn't sound to me as good performing as a transcoded binary.
But the other side of this is that games don't have to be using 100% CPU time on XB360 and many probably remain GPU limited or even frame time limited (especially the XBLA games) despite the CPU being so weak.
makes me curious to see some more recent games that made the XB360 sweat
Is there something?
Heavily optimized Altivec code will indeed be hard to deal with especially because it has so many registers. You're going to get inner loops that routinely blow the 16 XMM register budget. The emulator will probably have to heavily access registers in RAM to make up for this.
which again makes it more likely to go the complex way of parsing the opcodes into an AST and run all the VC backend for x86. It's not just register renaming and instruction translation, quite some code would be done in a different way (e.g. some load, modify, store on XB360 could end up in one instruction like "inc memory")
But there's been tons of presentation material on optimizing XB360 and PS3 CPU code that go far beyond just using well scheduled Altivec, so it's pretty safe to say that a lot of major games were heavily optimized throughout and would do a lot better than the vanilla C++ comparison you gave.
I agrre. That's what I wanted to point out also. there will be some 10% of code for e.g. physics, AI, etc. that are really critical and are heavily optimized and used in time critical cases. So in those moment (e.g. combat) not only there is way more pressure, but the code is also harder to translate. This might explain why in non-critical cases (e.g. maybe cutscene) the emulated version might run way better, while (as some claim) it action moments the FPS drops to 10fps. (again, my wild guess
)
makes me wonder whether MS might have a farm of programmers profilling critical code bits and re-writing those to x86 (at least c/c++ code with SSE intrinsics) and we'll get patches further improving game performance by some 2x, 3x, 4x scale in those low-fps situations.