Full transcript of John Carmack's QuakeCon 2005 Keynote

london-boy said:
I mean of course In-Order CPUs are slower than Out-Of-Order when running code written for Out-Of-Order CPUs!

Just in case it has not been repeated enough. There is no such thing is code written for "Out-Of-Order CPUs." Both CPUs run the same code. Being able to execute out of order is an OPTIMIZATION!

They run exactly the same code, the OO cpu will just be slightly less senstitive to BAD code.

Even if you tuned the hell out of your code so that it would run well on a in order CPU. The same code would still run far far faster on the CPU that could execute Out of Order.

The ONLY reason why these console cpus are not OO is to save transistors.

Less transistors = Smaller core = cheaper to make + less heat dissipated + higher clock speeds possible.

They traded off a very good optimization technique, to improve clock rate and make room for a few more cores.

Realtime code, like the game's 3D renderer, is already going to have been highly optimal. Especially when written by an experenced programmer like carmack. When carmack says his code is running at 1/2 the speed on an in order CPU, its not because he "has not optimized it" for in order use. It is simply because In Order CPUs are slower period.

And this is no shocker. Both MS and Sony knew this. But they are hoping that giving developers several slow CPUs will be able to give a better price/performance ratio than a single fast CPU. And theoretically, this should be possible. But in reality, some people disagree, including carmack apparently.
 
I do not have knowledg to argue for any side, but I still not understand that coments about the consoles being equal to a top end PC, because if that is real why would they invested so much on them, both CPUs and Xenus.
We can think that PS3 had TVs and such to use cell, but if then why MS would also creat a clean sheet CPU and GPU :???: :?: I am sure that this time they would be able to get a nice agreement, to the CPU and GPU anyway, so they can sell then cheap too.

And it wold also be easy to code, compability, keep costs somewhat lower...

I really dont understud why?
 
I think Mr. Carmack's statements were more political than technical...

The closest analogy that i can offer is...


Let's say Mr. Carmack has a maxed out, decked out, uber character for everquest...

We're talking best across the servers...

Then a new game, World of Warcraft, comes in the picture...

People sarted clamouring all over WoW because they feel that it's a better game...

With much consideration...

and careful analysis...

including, non stop pestering from his friends to switch over...

he still declines...

because it would mean that he would have to throw away his uber character in Everquest and start WoW with a measely level 1 guy...


-just my two cents-
 
pc999 said:
I still not understand that coments about the consoles being equal to a top end PC, because if that is real why would they invested so much on them, both CPUs and Xenus.

This was answered somewhat by Carmack himself in this quote.

"hardware people like peak numbers. You always want to talk about what’s the peak triangle, the peak fill rate, and all of this, even if that’s not necessarily the most useful rate. We suffer from this on the CPU side as well, with the multi-core stuff going on now" - Carmack


The X2 cpu and especially the Cell were designed for very high peak rates. And on paper this looks all good and well. But in reality, it's expected to be a pain in the ass to get even half way towards those peak numbers.

But of course consoles are a fixed platform with a fairly long life spans. 4-5 years. This means that as developers learn more and more about the HW they will learn new optimization tricks and be able to squeeze out more and more performance. Maybe in the 3rd or 4th gen titles we will start seeing games that start really working out the HW.
 
Lunchbox thats a really bad analogy. He's already developing/dabbling on the x360 and possibly PS3. Someone of his capability certainly isn't shirking away from the new consoles.
Not really sure what point you are trying to make
 
LunchBox said:
I think Mr. Carmack's statements were more political than technical...

The closest analogy that i can offer is...

Let's say Mr. Carmack has a maxed out, decked out, uber character for everquest...

We're talking best across the servers...

Then a new game, World of Warcraft, comes in the picture...

People sarted clamouring all over WoW because they feel that it's a better game...

With much consideration...

and careful analysis...

including, non stop pestering from his friends to switch over...

he still declines...

because it would mean that he would have to throw away his uber character in Everquest and start WoW with a measely level 1 guy...

-just my two cents-

I wouldn't go that far -- When you're understanding of programming is as deep as Carmack's, programming on a console or on a PC all sort of ends up being the same. When you like to play in the registers everything just starts becoming a blur anyways... I don't have any doubts that Carmack would be quite the console programmer if he liked what they had over the PC (and he seems to like Xbox360 and said his next engine will be console/pc from the get go -- like UE3 I guess). I think hes just trying to soften the blow of marketing hype by being a bit overly pesimistic at times (although he also says they are great platforms and have a lot of power). It seems to me hes being rather realistic about the situation, I certainly didn't gather that he hated consoles -- just that he didn't really like the Hype that had been produced and he was trying to cut through it a bit. The HD Era is not the greatest thing since sliced bread.
 
LunchBox said:
I think Mr. Carmack's statements were more political than technical...

The closest analogy that i can offer is...
Let's say Mr. Carmack has a maxed out, decked out, uber character for everquest...
We're talking best across the servers...
Then a new game, World of Warcraft, comes in the picture...
People sarted clamouring all over WoW because they feel that it's a better game...
With much consideration...
and careful analysis...
including, non stop pestering from his friends to switch over...
he still declines...
because it would mean that he would have to throw away his uber character in Everquest and start WoW with a measely level 1 guy...
-just my two cents-


Actually, to understand his POV, you have to ONLY think about FPS games.

He doesn't care about a factor of 10 increase in physics compute ability, because in an FPS game a) your viewport is small and some 60% of your immediate area is going to be behind you. All objects that are off screen but in the area have to be simulated, even the ones that are not moving. If you did dedicate a considerable amout of CPU toward physics in a FPS game, your going to have to conceed than ~60% of the CPU work the player is not even going to see. And in Carmacks POV those cycles could be used better elsewere.

Of course there are other game genres where dedicating a lot of CPU toward physics will have a very positive effect. But those games are not what carmack wants to make.

Same with AI. A good FPS SP AI oponent is merely an entertainer or performer in front of the player. The better FPS games are the ones where the developers apply a lot of tender loving care to their characters behaviors so that they appear to do interesting things once in a while. They do this by defining more unique behaviors for this and and creating scripted reactions to certain events. The bad FPS games are of course the ones where the AI can't even do correct path finding (you know who you are) but I think we are past this. Currently "Good AI" in a FPS game is basically just a function of the number of man hours you dedicate to it. Dedicating a whole lot more CPU to it is not going to make it better.

Of course there are other game genres where dedicating a lot of CPU toward AI will have a very positive effect. But those games are not what carmack wants to make.
 
inefficient said:
Actually, to understand his POV, you have to ONLY think about FPS games.

He doesn't care about a factor of 10 increase in physics compute ability, because in an FPS game a) your viewport is small and some 60% of your immediate area is going to be behind you. All objects that are off screen but in the area have to be simulated, even the ones that are not moving. If you did dedicate a considerable amout of CPU toward physics in a FPS game, your going to have to conceed than ~60% of the CPU work the player is not even going to see. And in Carmacks POV those cycles could be used better elsewere.

Of course there are other game genres where dedicating a lot of CPU toward physics will have a very positive effect. But those games are not what carmack wants to make.

Same with AI. A good FPS SP AI oponent is merely an entertainer or performer in front of the player. The better FPS games are the ones where the developers apply a lot of tender loving care to their characters behaviors so that they appear to do interesting things once in a while. They do this by defining more unique behaviors for this and and creating scripted reactions to certain events. The bad FPS games are of course the ones where the AI can't even do correct path finding (you know who you are) but I think we are past this. Currently "Good AI" in a FPS game is basically just a function of the number of man hours you dedicate to it. Dedicating a whole lot more CPU to it is not going to make it better.

Of course there are other game genres where dedicating a lot of CPU toward AI will have a very positive effect. But those games are not what carmack wants to make.



Thank you!!!

that actually made a lot of sense to me :)

I appreciate your input :)
 
inefficient said:
This was answered somewhat by Carmack himself in this quote.

"hardware people like peak numbers. You always want to talk about what’s the peak triangle, the peak fill rate, and all of this, even if that’s not necessarily the most useful rate. We suffer from this on the CPU side as well, with the multi-core stuff going on now" - Carmack


The X2 cpu and especially the Cell were designed for very high peak rates. And on paper this looks all good and well. But in reality, it's expected to be a pain in the ass to get even half way towards those peak numbers.

But of course consoles are a fixed platform with a fairly long life spans. 4-5 years. This means that as developers learn more and more about the HW they will learn new optimization tricks and be able to squeeze out more and more performance. Maybe in the 3rd or 4th gen titles we will start seeing games that start really working out the HW.

That means that they are faster but hard to get is performance, what he says is that they are not faster but still hard to get their performance, ence my doubt.
 
inefficient said:
Even if you tuned the hell out of your code so that it would run well on a in order CPU. The same code would still run far far faster on the CPU that could execute Out of Order.
I don't believe that. OOO is about filling in gaps in the execution pipeline with other instructions not following the order of the code. Now something written specifically for in-order execution won't have many gaps, meaning OOO won't have much scope to find and execute extra instructions. On generic code I expect OOO to be a good deal faster than an IO core, but on IO optimized code all things being equal there shouldn't be a lot of difference. Personally I think the simpler core with fast and reasonably substantial local storage should manage some turbo-performance on IO code that your typical OOO cores won't be able to match.
 
I hate to say it but this thread seems to be divided between people who work as programmers and have a great deal deal of understanding of the hardware and software issues involved versus those who are more interested in playing games. Strangely the former seem to identify with the issues which Carmack raises whilst the latter seem to want to believe in the marketing hype. In the end it's qustion of, "Who do you trust the most?"
 
Shifty Geezer said:
I don't believe that. OOO is about filling in gaps in the execution pipeline with other instructions not following the order of the code. Now something written specifically for in-order execution won't have many gaps, meaning OOO won't have much scope to find and execute extra instructions. On generic code I expect OOO to be a good deal faster than an IO core, but on IO optimized code all things being equal there shouldn't be a lot of difference. Personally I think the simpler core with fast and reasonably substantial local storage should manage some turbo-performance on IO code that your typical OOO cores won't be able to match.


But isn't "Something written specifically for in-order execution" somewhat of a pipe dream. Sure there are going to be some routines that be written in a way to reduce/eliminate data dependencies and things that would otherwise stall the IO core. But the amount of code you will be able to hand optimize will be tiny. And even after hand optimization, the CPU will still have oportunity to make further optimizations. Which is why I said faster.

Also, the compiler is already trying it's darndest to keep all the instruction units busy. It would be interesting to see numbers on the performance increase just from the compiler's instruction reordering and optimizations.
 
As far as I know there are no "optimizations" for out-of-order execution - I can't think of any situation in which it would be advantageous to intentionaly increase logical/data depedencies...why would you ever want to increase the amount of OOE the processor has to do?
In their defense, there are still things people do in PC game engines that would be total hell on next-gen consoles. Branching, brazen memory access, jumping around in memory. There are probably going to be little things that avoid extra computations (e.g. pass pointers around) that would probably be faster on next-gen consoles to just recompute the darn things on the spot. It's often hilarious when you see two computational paths in two opposite branches, but inlining the whole thing out to do both paths and a conditional move at the end triples the speed of the function.

People just seem to think about it as "code for OOOE platforms" as opposed to "code for in-order platforms." Really, back when x86 was in-order, things hadn't really been all that different, but then the CPUs were inherently slow back then, so avoiding computation was often still faster than the cost of hitting memory. With next-gen, things will have to be different, and what most people don't seem to be getting with their "in-order/OOO" comments is that there's more than just that one reason why it has to change. The differences don't really lie in the code that's produced by the compilers, as much as the code produced by the humans.
 
london-boy said:
I'm not sure why there's such a doom and gloom atmosphere. Maybe perhaps next gen CPUs won't run code made for other CPUs fast (which is obviously true for any CPU, i mean of course In-Order CPUs are slower than Out-Of-Order when running code written for Out-Of-Order CPUs! Like N64 code was slow to be emulated on any other platform till not too long ago and still today!), and obviously the limited RAM will be a big hurdle for PC developers.

Listen people. Code isn't written for In-Order Cpus or OoO Cpus. It is written in C or C++, etc. The compiler will then process said code and apply optimizations. The optimizations that get the most performance out of an out of order micro-architecture are also the optimizations that get the most performance out of an in order cpu.

In other words, if processor B is slower on recompiled code than processor A then it is a fairly good sign that processor A is just plain faster.

Drop all this nonsense of code "written for out of order cpus" or "OoO code". It doesn't exist and never has.

Aaron Spink
speaking for myself inc.
 
inefficient said:
Just in case it has not been repeated enough. There is no such thing is code written for "Out-Of-Order CPUs." Both CPUs run the same code. Being able to execute out of order is an OPTIMIZATION!

They run exactly the same code, the OO cpu will just be slightly less senstitive to BAD code.

Actually the primary benefit of an OoO engine is the ability for the micro-architecture to respond much better to short time scale dynamic events plus the ability to handle short term latencies much better (generally L2 load to use latencies). Another advantage is that OoO uArch's tend to have better memory level parallelism but this is a side effect of being able to handle short term latencies. In general, an OoO microarchitecture wil also be highly sensitive to bad/un-optimized code.

Aaron Spink
speaking for myself inc.
 
Shifty Geezer said:
I don't believe that. OOO is about filling in gaps in the execution pipeline with other instructions not following the order of the code. Now something written specifically for in-order execution won't have many gaps, meaning OOO won't have much scope to find and execute extra instructions. On generic code I expect OOO to be a good deal faster than an IO core, but on IO optimized code all things being equal there shouldn't be a lot of difference. Personally I think the simpler core with fast and reasonably substantial local storage should manage some turbo-performance on IO code that your typical OOO cores won't be able to match.

Believe what you want. But understand the reality which is that with equivlent execution resources, an OoO microarchitecture will result in higher performance even with code heavily tuned for a prior in order micro-architecture.

As an example, binaries heavily tuned for Alpha EV5 ran significantly faster on EV6.

The primary benefit of OoO uArch isn't re-optimizing the instruction stream or filling gaps, but an ability to keep executing through short stalls caused by things like L1 cache misses.

There has been a LOT of research in this area and generally OoO will give 30+% increases in performance, all things being equal.


Aaron Spink
speaking for myself inc.
 
ShootMyMonkey said:
In their defense, there are still things people do in PC game engines that would be total hell on next-gen consoles. Branching, brazen memory access, jumping around in memory. There are probably going to be little things that avoid extra computations (e.g. pass pointers around) that would probably be faster on next-gen consoles to just recompute the darn things on the spot. It's often hilarious when you see two computational paths in two opposite branches, but inlining the whole thing out to do both paths and a conditional move at the end triples the speed of the function.

Branching: How are you going to get around branching? What? Predicated excecution? Funny!

Memory Access: what you don't thing memory access hurt on modern PC processors? The access latencies in cycles are as high if not higher on the PC side vs the new console processors. If anything memory accesses hurt MORE on current PC processors.

Jumping around in memory: Not sure how this is any different than memory access but I would assert that this hurts as bad if not worse on modern PC processors.

Aaron Spink
speaking for myself inc.
 
Ok so essentially OOO just hides the flaws in unoptimized code much better than IO.

Either way the code is still unoptimized, but that just has a much greater impact on a IO cpu than a OOO cpu.

So any "optimizations", ie loop unrolling, you do to the code will speed it up on both a OOO and IO processor, since it's just faster code period. But it will just have a greater impact, and is much more necessary on an IO processor.

does that basically sum it up?

Q: How does having 6 threads, and 3 cores available help alleviate stalls in the CPU? It seems to me that having these 6 threads available would be one of the main ways you could try and keep your CPU busy between stalls. For example if one thread is stalled out waiting for a memory request, one of the other 5 threads can step in and get something done?

Doesn't the solution to the IO problem, lie in the ability to have multiple threads? And it's just a matter of the Dev's mastering the multi-threaded capabilities to solve the problems of having an IO cpu?
 
Last edited by a moderator:
Ok so essentially OOO just hides the flaws in unoptimized code much better than IO.

No, it's not so much about "bad" code. That's not really what makes a big difference, as Aaron said, it's all about NOT stalling. You can't get around say a load, because without data, your program doesn't do anything.

For instance, as soon as there is a load instruction, you're waiting for the data to be fetched, well if you're lucky it's in L1, if you're not it's in main memory. Guess what, no matter what, you're going to burn cycles. This is where an OoO, might be able to find instructions in the scheduling window to execute in the mean while.
 
Last edited by a moderator:
Is there no way to queue threads so that they are poised to execute if the CPU stalls waiting for a memory request?

I would guess the answer is yes, but the larger problem is HOW to multi-thread game engine period.

Once you dev's have nailed down exactly the best ways to parralize game code, is it not fairly straightforward after that?

Let me phrase that another way, if you managed to write an engine that managed to keep at least 3-5 threads busy at all times, would you then be able to use these threads to keep your IO CPU from wasting cycles?
 
Back
Top