And maybe a simple OOE version of Xenon could be developed.
Just dont try the optimize everything.
Also add some power saving techniques.
Again a full redesign could be needed but in the end the result could be very interresting.
I am not sure what you are aiming to achieve, best possible single core performance or best possible throughput?
If you are looking for a new version of Xenon, with simple in-order cores, and a huge focus to having as high thread count as possible (6 threads back then was 3x higher than any x86 CPU could achieve), IBM already has done that. That chip is called PowerPC A2. It has 16 cores and each of the cores can execute 4 threads simultaneously. In total that's 64 threads. Getting best performance out of it would likely require similar optimization techniques than for Xenon (however this should be slightly better as there are 4 threads per core now instead of 2 threads per core, so more TLP can be automatically exploited). A throughput monster can be surely created that way (but it requires software to be designed to run with 64 threads to get good performance out of it).
Since this thread title is "Bring back high performance single core CPUs already!", the highly threaded PowerPC A2 route seems to be the complete opposite. If you want good single threaded performance, you have to exploit ILP (instruction level parallelism) from a single thread as well as possible. Simple in-order cores are not good at that. You basically have to try to handle the ILP extraction statically at compile time (Intel tried to go that route with Itanium, and we all know how that ended up). So basically if you want to have the best possible single thread performance, you have to go the way Intel did with its newest designs (Core/Sandy/Ivy). Spend a huge amount of transistors on other parts than pure execution units. Have deep OOO execution, exploit ILP as much as possible, fight against all stall cases to keep the pipelines occupied, focus on buffering/caches and memory latencies, etc. This is basically the opposite of simple in-order execution. Focus on having a clever core that is utilizing all its execution resources as much as possible (compared to a brute force core that stalls often and does nothing).
Intel's recent CPUs seem to be constrained by heat, not by clock rate. Turbo clocks are near 4 GHz, and the pipeline stages are still working just fine. A single core version could likely run at 5 GHz all the time. It wouldn't break any power efficiency records (as power usage grows very quickly as you increase clocks), but it would certainly beat everything in single threaded performance. If they wanted to go that road, they could widen the core a bit (more execution units to handle ILP peaks) and just scale things up, as transistor budget wouldn't be anywhere a limiting factor anymore. But again all this would decrease power efficiency, as reaching beyond the sweet spot (in many areas) would only increase the performance slightly, but increase the transistor count (and heat production) dramatically. The question really becomes, is there a large demand for a single core that has 2x performance (or likely less) compared to the current high end cores, but eats as much power as four current high end cores (halving the throughput at equal TDP)?