JC Keynote talks consoles

Laa-Yosh said:
Okay, so we're talking about the guy, who - with Michael Abrash - wrote a software 3D renderer that basically fit into the cache of the Pentium processor? And as far as I know he had ideas like using BSP to process the game levels' geometry, use lightmaps for static lighting, or even things like smooth scrolling on a PC with Commander Keen, fast raycasting based 2.5D engine, and so on... So, is this the same John Carmack we're talking about? Because then I'm probably interpreting things a bit differently...

Actually, it's not really the same guy we are talking about. That was John 10-20 years ago. He's certainly got a lot of accomplishments under his belt, but he's not John Carmack the tireless, young, startup-company programmer anymore. He's John Carmack, the older, more well rounded, Technical Director of ID software. I don't doubt that he is still a great programmer, *especially* when dealing with things like BSP, raycasting, etc etc. What I do doubt is his authority on parellel computing given that his only brief stint with it seems to be the lackluster implementation (which was later removed) in quake 3.

I don't want to you to take away from this message that I think John is incapable of writing multithreaded code, or that he's incompetent. It is just that what John says basically dismisses the work of hundreds of very very smart engineers at Toshiba/IBM/Sony that have spent their entire careers designing and working on multithreaded processor designs and algorithms.

In this specific case, it just seems as though there are other people in the field that may have a more qualified opinion than John does.

Nite_Hawk
 
Nite_Hawk said:
What I do doubt is his authority on parellel computing given that his only brief stint with it seems to be the lackluster implementation (which was later removed) in quake 3.
IIRC, Quake 3 got a 40% speed up from going to a dual CPU machine. That's close to the expected peak for making an arbitrary app dual threaded. The rule of thumb I've heard is the square root of the number of processors. So 2 processors should be 1.41x faster, 4 processors 2x faster, etc.

And it was removed because OS and hardware issues made the benefit not work on several setups. It wasn't that he wasn't getting good enough performance from it.


I don't want to you to take away from this message that I think John is incapable of writing multithreaded code, or that he's incompetent. It is just that what John says basically dismisses the work of hundreds of very very smart engineers at Toshiba/IBM/Sony that have spent their entire careers designing and working on multithreaded processor designs and algorithms.
Whoah, there. Are you comparing hardware engineers to software engineers? Because I don't think the bulk of the engineers at those companies have actually tried to do what Carmack did. They're working on chips, not debugging race conditions.

And I'm pretty sure Carmack is not dismissing their work. I didn't get that sense from the video, anyway.
 
Nite_Hawk said:
In this specific case, it just seems as though there are other people in the field that may have a more qualified opinion than John does.

One of the key points that Carmack has brought up was that some of that "more qualified opinion" is driven by a marketing department's desire to boost a product's image by very high performance numbers, which are actually just theoretical peak rates.
And as the actual engineers working at those companies probably haven't had as much influence on the criticized decisions as they would've liked to, it might as well be that they share Carmack's opinion, but their job does not allow them to get vocal about it...
 
i wonder what JC thinks about the different bandwidth setups for each console. esp interested to know what he thinks about edram for the gpu.
 
Well, EDRAM sounds like an ideal hardware solution for his (at least previously) preferred method of heavily multipassed rendering...
 
Well, I was under the impression that consoles provide the benefit of being able to exploit a platform´s strengths while PC´s are dominated by software layers such as OpenGL that provide compatibility, but reduce efficiency considerably. Is this not true?
Well it's not always true. Xbox certainly hid things from you and you had no choice but to work behind an API layer. Though I never did anything with PS1 (was in middle school at the time), it was supposedly the case there as well. But the thing in either case was that you had enough hardware power to get by without having to worry too much about it. Still, it's up to you to really decide how you want to use it. When do you push data across, how do you want to group your packets, optimizing everything that goes on in software as opposed to just the graphics pipe... keeping everything that costs a lot down to a minimum. That part of optimization and getting "down to the metal" is the same no matter the platform. Granted, what costs a little and what costs a lot will be different from platform to platform, but the point of actually having to deal with it is what's the same.

Don't assume that low-level = ASM. Low-level simply means at the point closest to the hardware, and that can still be in high-level code that controls when you push packets across the DMA, and buffers stuff off for later use. We had to use a hell of a lot of ASM for computationally heavy portions that were frequently repeated back when clock speeds were in the 10s of MHz range, but not so much nowadays.

Often times, I think these ideals of making a tight, efficient pipeline can clash with the ideal of making solid flexible tools. That difference is probably the big reason why Epic is doing so well.

I wonder why Sony/MS went for In Order CPUs if the performance is so below conventional ones.
Again, it's not just in-order. It's in-order, poor branch prediction, long pipelines, small caches, high latency memory, but with high theoretical throughput. Fundamentally, it's just that you could take a high-clocked dual-core A64, and get something quite powerful, and you'll have a $1200 console. Next-gen Neo-Geo, anybody? And even then, the raw throughput would have been, what, 10 GFLOPS? If we can figure out dual-threading, I think we can hit a wall at 10 GFLOPS soon enough, considering that current-gen hardware happens to include a CPU triplet that approaches a theoretical limit of 6 GFLOPS. Using simple hardware means more functional hardware in place for less cost.

As much as the new cores are capable of their hundreds of GFLOPS, that's only in an ideal instance. We're light years away from that. The good news is that even if we ultimately achieve, say, 10% efficiency on a CELL, that's a huge amount of power -- that's more than even the theoretical limits of any dual-core PC CPU right now. Pretty good bargain, when you think about it that way. Problem is, that in code terms, the difference between what we get now and 10% efficiency is beyond enormous.

In this specific case, it just seems as though there are other people in the field that may have a more qualified opinion than John does.
I didn't get any impression that he was talking about how parallel programming works -- it was more like he was addressing the issues with game code in parallel programming. And he's certainly qualified to talk about game code.

As far as parallel game code, I don't think there is such a thing as an expert in the area. Concurrent programming in general, sure, but applying top-end theoretical research to something that critically has time resolutions for an entire execution of the outer loop on the order of 16/33 ms... not the simplest thing.
 
Last edited by a moderator:
Heh. Quake 3 is one of if not the only SMP capable game out right now. But if I remember correctly it was of limited benefit, due to the GPU-limited nature of systems even at resolutions like 800x600.
 
Almasy said:
So, is there no possible way to optimize code for In order CPUs? That´s what I was trying to refer to. I wonder why Sony/MS went for In Order CPUs if the performance is so below conventional ones.

Price, area, FLOPs P*nis measuring, etc.

Neither Intel nor AMD were likely to sell MS or Sony a custom designed dual or triple core device for much below $100 if that. Both AMD and Intel are running at or near capacity AFAIK and have no intention of reducing their profit to go after the low margin console business.

The only other OoO option is the IBM Power4/5 which is a lot higher power than what MS/Sony wanted.

So they get IBM's next generation embedded core. Which is kinda small, cheap (cause IBM needs the business BAD), not really that fast. But they can put all these big SIMD engines on it which gets the FLOPs number up. Which will work for some small code segments, but doesn't help with a lot of code.

And yes, code can be optimized. But the same things that optimize code for in-order CPUs also gives exactly the same benefit to OoO CPUs.

Aaron Spink
speaking for myself inc.
 
Inane_Dork said:
IIRC, Quake 3 got a 40% speed up from going to a dual CPU machine. That's close to the expected peak for making an arbitrary app dual threaded. The rule of thumb I've heard is the square root of the number of processors. So 2 processors should be 1.41x faster, 4 processors 2x faster, etc.

And it was removed because OS and hardware issues made the benefit not work on several setups. It wasn't that he wasn't getting good enough performance from it.

In terms of speed the implementation was probably about as good as you can expect, especially given the other restraints like the videocard. In terms of SMP support being removed we will probably know more once the Q3 source is released. What specific OS/Hardware issues are you thinking of? Are you just basing this off what he said in the keynote?

Whoah, there. Are you comparing hardware engineers to software engineers? Because I don't think the bulk of the engineers at those companies have actually tried to do what Carmack did. They're working on chips, not debugging race conditions.


I think when you are talking about a field as specific as parallel computing you are going to end up with a lot of knowledge about both hardware and software in the same company. Certainly John seems to feel comfortable commenting on both the software and hardware aspects of these system. I don't think it unreasonable to hold his opinions next to those of both the hardware and software engineers working on Cell.

And I'm pretty sure Carmack is not dismissing their work. I didn't get that sense from the video, anyway.

Carmack doesn't outright dismiss anything, but he certainly seems to downplay any significance there is. The feeling I get from him is that he thinks of both Cell and the xbox360 cpu as just being hard-to-program incremental improvements on what we had before. Personally, that seems like dismissing their work to me. The Cell and to a lesser extend the Xenon processor seem a lot more revolutionary than evolutionary. Whether or not they will fulfill their potential is up for contention, but so long as Carmack is complaining about them, I doubt it will be him that leads any revolutions on these new processors.

Nite_Hawk
 
Nite_Hawk said:
What specific OS/Hardware issues are you thinking of? Are you just basing this off what he said in the keynote?
Yeah, it's from the keynote. It was the classic "it worked on my machine" problem. :p


I think when you are talking about a field as specific as parallel computing you are going to end up with a lot of knowledge about both hardware and software in the same company.
I really don't think that's reasonable. The skills it takes to synchronize caches across a bus and arrange transistors on silicon and not that comparable to the skills it takes to design an effective parallel system and debug it. At least, those two skill sets seem very different to me.


Certainly John seems to feel comfortable commenting on both the software and hardware aspects of these system. I don't think it unreasonable to hold his opinions next to those of both the hardware and software engineers working on Cell.
Cell engineers have a vested interest in promoting one thing. Carmack does not. Plus, to me anyway, he only seems to comment on hardware as it affects his ability to program.


Carmack doesn't outright dismiss anything, but he certainly seems to downplay any significance there is. The feeling I get from him is that he thinks of both Cell and the xbox360 cpu as just being hard-to-program incremental improvements on what we had before. Personally, that seems like dismissing their work to me. The Cell and to a lesser extend the Xenon processor seem a lot more revolutionary than evolutionary.
I think what he's pointing out is that he's going to have to work significantly harder to attain the same level of performance on Cell & XeCPU that he can get on a high-end Athlon or P4. And that's reasonable, IMO. It may be downplaying Cell and XeCPU, but maybe they should be downplayed.


Whether or not they will fulfill their potential is up for contention, but so long as Carmack is complaining about them, I doubt it will be him that leads any revolutions on these new processors.
Carmack is not complaining. I did not get that sense at all.
 
There are things you can do on Cell which you simply wouldnt have the horsepower for on a current x86 ... consoles have a long lifetime, some developers will suck it up and get good performance out of it eventually just as he said.
 
Inane Dork said:
And that's reasonable, IMO. It may be downplaying Cell and XeCPU, but maybe they should be downplayed.

Can you tell me why they should be downplayed? What have they (they being Sony and MS) done to deserve this.

MfA said:
There are things you can do on Cell which you simply wouldnt have the horsepower for on a current x86 ... consoles have a long lifetime, some developers will suck it up and get good performance out of it eventually just as he said.

What things are you talking about? I heard people on this very board say things like this months ago, but now the mood has changed.:-?
 
These discussions are absolutely ridiculous. The technical people on this board have basically corroborated John's premises and generally agree with his conclusions, we have people on the other side, bringing up all sorts of odd ball concerns about how John's attacking something.

What's ludicrous is the assertion that he's some how disrespecting work, he's not. He's merely saying, with what has been delivered this is what he expects to happen. The "huge" gains that were being droned on about by the marketting departments just aren't exactly feasible for many situations.

To sum it up, there were areas where we've taken a step back and those areas mean that the programmers, not the hardware guys basically have the onus on them for hitting performance marks, more so than ever. This isn't a case of programmers being lazy, in fact, I'd put it on the hardware guys.
 
Saem said:
These discussions are absolutely ridiculous. The technical people on this board have basically corroborated John's premises and generally agree with his conclusions, we have people on the other side, bringing up all sorts of odd ball concerns about how John's attacking something.

What's ludicrous is the assertion that he's some how disrespecting work, he's not. He's merely saying, with what has been delivered this is what he expects to happen. The "huge" gains that were being droned on about by the marketting departments just aren't exactly feasible for many situations.

To sum it up, there were areas where we've taken a step back and those areas mean that the programmers, not the hardware guys basically have the onus on them for hitting performance marks, more so than ever. This isn't a case of programmers being lazy, in fact, I'd put it on the hardware guys.

Saem,

We've had technical discussions in the past. Asserting that all of the technical people on this board basically agree with John is like saying anyone who disagrees with him isn't. That's pretty insulting.

John is a smart guy. I've said that and I will continue to say that. On the other hand I'm sure he knows how much weight his words carry. Simply the tone of his message conveys about as much weight as whatever words he says for the majority of the population. He is like Alan Greenspan in this respect. People will analyze every nuance of what he says. If he really means to convey that "the systems are great", he should have started out saying that and not devoted the majority of his speech to what he believes to be thier weaknesses. You may see it as a "reality check", but he's certainly willing to keep his mouth shut when it suits him.

Having said all of that, I think he has a number of valid points (these systems *are* going to be hard to code for, and *are* going to take a lot of work to extract performance from). He is only just another coder though. A good one, one that should be respected, but he's not the only authority out there. People here seem to think that any critizism of John is like questioning the Pope. He's not infalliable.

Nite_Hawk
 
Saem said:
This isn't a case of programmers being lazy, in fact, I'd put it on the hardware guys.
Or more specifically the business guys. What hardware designer wouldn't want the best of both worlds? Fast single threaded performance and multiple threads. Cost and schedules get in the way of these things sometimes.
 
mckmas8808 said:
What things are you talking about?
Anything which requires more than the peak performance of a current x86 processor.

Specifically semi-regular problems to eat up lots of flops ... the obvious ones are physics and view dependent tesselation. Speculatively, adaptive-z buffer shadows maybe (requires quite a bit of CPU interaction to determine the sampling density needed for a given screen tile). If you have SPEs sitting idle you can find ways to use them if you try hard enough, even if it is just to take pressure off the GPU ... wasn't like Doom 3 exactly did all it's work on the GPU.
 
Nite_Hawk,

you're extracting from a technical matter, and then turning things into a political one and then interpretting things all funky.

Carmack seemed fine in his tone and approach. The criticisms leveled against his "comments" were either against poor interpretations of what he said or just poor, for the most part.

I'm not a 100% with him on everything either, but it has gotten silly that there is this pathetic effort to attack the guy on a personal level it's not adding anything. When the argument starts bringing up his history and so on, it's off topic and entirely non-technical. It's, "look at me I can piss on the him and not address what he's really saying." This is exactly what happened, arguments started buckling and people dug up his, "track record".

No offense to you, you're an excellent poster, but in this particular vein not so much.
 
Or more specifically the business guys. What hardware designer wouldn't want the best of both worlds? Fast single threaded performance and multiple threads. Cost and schedules get in the way of these things sometimes.

Possibly, I remember hearing Sony and Toshiba rejected IBM's idea to merely work their Power series to meet the design goals.
 
mckmas8808 said:
Can you tell me why they should be downplayed? What have they (they being Sony and MS) done to deserve this.
I must question what you've seen and heard about Cell and XeCPU. If it's the same as what I've seen and heard, I'm not sure what's disagreeable with my comment.

What I've seen and heard about the two is a LOT of theoretical peak numbers: FLOPs, cores, cache sizes, etc. And I've heard a bit about some practical peak numbers from tech demos. The key defining point about these things is that they're asking the question, "How much can I get if I shape my program to the finest nuances of the hardware?" The worst part about this is that added power is often promoted as enabling new gameplay, as if developers start with GFLOPs and figure out what they can make with them.

Carmack's statements are more the result of the question, "How much can I get if the hardware runs my game?" The answers to these two question are going to be different. Now, of course, developers push their games toward the first question, but they pretty much all start with the second.

So yes, the theoretical specs should be downplayed. Assume a more reasonable utilization of all cores and a more reasonable IPC and we get something far more useful. For instance, to say that you can get 30 GFLOPs out of the XeCPU from a game-like scenario (number pulled from my hat) is more interesting to people who want to know what they'll actually get out of these machines (that is, what they'll be playing).
 
Back
Top