Gabe Newell on NextGen systems

mckmas8808 said:
Ok for a video game console what can out do this two chips right now?

A Pentium 4, a modern graphics card, a really nice DSP, I don't know... A lot of things might perform better than those two chips if you make them run tasks tailored for other processors.

And I don't think Intel is dropping out-of-order-execution from their future chips.
 
Is he even correct in asserting "in-order CPU's" was some big industry "move"? Isn't it more true that basically every processor that appeared in a console before XB and GC utilized some sort of "in-order" processor (even PS2 included as an in-order design?)? If so, then it is really XB and GC that "upset" the industry with OoO CPU's, and next generation we are simply seeing a return to the previous status quo? I'm not saying that one is better or not, but if we are seeing the console industry go from IO to OoO (for merely 1 generation) and then back again, who are we to say that IO is some sort of blight upon the console landscape? It seems to have served them well in the past, and apparently the potential benefits were enough for them to consider going back to their roots. So I imagine this is hardly an issue for veteran console developers, rather than some whiney like bemoaning of newbie developers who cut their teeth on XB and GC and are fretting that they might actually have to learn something new out of their native sandbox.
 
Johnny_Physics said:
mckmas8808 said:
Ok for a video game console what can out do this two chips right now?

A Pentium 4, a modern graphics card, a really nice DSP, I don't know... A lot of things might perform better than those two chips if you make them run tasks tailored for other processors.

And I don't think Intel is dropping out-of-order-execution from their future chips.

Okay so how much would that actually cost? I assume WAY more than what Sony and MS are paying. I wasn't talking about unlimited funds here. Maybe I should have added that in. Within cost what is out today or maybe coming out within a year that could out do what the PS3 and X360 is going to do?
 
mckmas, you shouldn't comment on what you lack knowledge about. These are valid points and will be a pain for PC devs getting used to writing code for in-order CPU's.

The higher development costs will come mainly from the higher needed art costs. So that is what most devs should be crying out about.
 
I might be stepping on a lot of toes by saying this, but the majority of the complaints about coding difficulty seem to be coming from the PC game developers. The transition seems to be more difficult for PC developers, who are used to desktop CPUs doing a lot of the housekeeping for them.

His point about hardware manufacturer performance claims is dead on, of course, but everyone already knows peak perf. != actual perf.
 
nondescript said:
I might be stepping on a lot of toes by saying this, but the majority of the complaints about coding difficulty seem to be coming from the PC game developers.

That's exactly it, lazy PC development teams who've been sitting on the "RAM is cheap!" bandwagon for years whining about writing efficient code for once.
 
Mfa said:
If you are writing in a HLL the fact that it is in order doesnt concern you in the least.
It does when any access outside small L1 D-cache has 30-40cycle latency or higher. On the positive side, many of us have worked at least for some time with this kind of setup over the last 5 years.
 
_leech_ said:
nondescript said:
I might be stepping on a lot of toes by saying this, but the majority of the complaints about coding difficulty seem to be coming from the PC game developers.

That's exactly it, lazy PC development teams who've been sitting on the "RAM is cheap!" bandwagon for years whining about writing efficient code for once.

Well, "lazy" isn't exactly the word I would use, because anyone making a living writing code, is, almost by definition, not lazy. But they do have a significant transition challenge.
 
nondescript said:
_leech_ said:
nondescript said:
I might be stepping on a lot of toes by saying this, but the majority of the complaints about coding difficulty seem to be coming from the PC game developers.

That's exactly it, lazy PC development teams who've been sitting on the "RAM is cheap!" bandwagon for years whining about writing efficient code for once.

Well, "lazy" isn't exactly the word I would use, because anyone making a living writing code, is, almost by definition, not lazy. But they do have a significant transition challenge.

True, I was talking about within that context. PC developers are too used to having too many things done for them, letting rapid user upgrade cycles take care of their inefficiencies for them.

You heard this same sort of crap when the PS2 was released, and in the end it meant squat. Real development teams with real talent are what's going to push the PS3 and X360 to their limits, not PC developers.
 
mckmas, you shouldn't comment on what you lack knowledge about. These are valid points and will be a pain for PC devs getting used to writing code for in-order CPU's.

The fact that I not as smart as you about this does not mean that I don't understand what's being said from both sides. I know it is a pain, but devs have to live with it. But in the long run it will all payoff.

Could it be possible that console games will rival PC games longer into their generation than in the past? I think so. With the heavy technology thats going into the PS3 and X360. What do you guys think?
 
In what ways are you asking? In terms of graphics then for a time being, yes. But in terms of actual gameplay, content, or sales then I do believe that console games have and always will be better than most PC games released. It's a lot easier to make a game when you are able to optimize for a specific piece of hardware and can expect the performance out of it.
 
Real development teams with real talent are what's going to push the PS3 and X360 to their limits, not PC developers.

Do you actually write code for a living? Because I know a LOT of PC developers who would take offense at your statement that they're lazy and lack "real talent" :p

Like I said in another thread, it's hard enough to actually get a game up and running, without having to worry about optimising to the metal. The more a CPU/compiler takes care of optimisation, the more a coder can work on cool physics/AI/whatever.

Also, I'd hardly say that PC developers "rely on the upgrade cycle" and are lazy - console programmers have the benefit of a FIXED SYSTEM, something PC developers NEVER have. They have to support an almost unlimited number of configurations. So they get to spend the time console developers spend optimising on getting their code to run on everyone's system - fun! :)
 
swaaye said:
Well he's 100% right about it being a LOT harder to program for these crazy in-order multi-core oddities.

These chips seem to have been designed to make it easy to boast about them. It's just going to be insanely harder to get max performance out of them. Absolutely untrivial. And it will require serious changes in how a senior coder from this gen deals with next gen. Nothing like throwing away years of experience on out-of-order cores?

In-order cores haven't been used on PCs since the original Pentium. Excluding low performance Centaur-team chips. Hell the AMD K5 was out of order! The gist of it is that the high performance chips are all out of order designs. OOO has been evolving for like 10 years! And now they just drop it in favor of massive clock speed and multi-cores.

I would go so far to say that an AthlonXP 3200+ is a decent match for what these new console chips are capable of. At least until (or if it's possible) to do some serious, down to the metal optimization of code. MASSIVE man hours and skill.... I've visited and spoken to the developers at Raven Software a few times. Last time I was there they were telling me how development costs and time are rising out of control for big titles. Well, they sure as hell didn't need this kind of bomb dropped on them. This is going to cost a LOT of money for developers. A LOT.

At least the GPUs in the new consoles are full-on designs that seem to have very few serious cost cutting measures going on.

I just think many of you are caught on this hype trail, as is ALWAYS the case with consoles. You don't see just what the CPU engineers have done to these chips.... It's very bad.

The original Pentium Pro ran code made for the Pentium slower than the Pentium did, though once code was made specifically for the Pro(didn't take long) it performed much better. Wasn't the PPro capable of outperforming cpus that had 3-4x it's theoretical power? The new console cpus have quite a bit more than that though.

BTW, how come the G5 cpu had so many less transistors than a P4 or Athlon 64? I doubt that the huge difference is due to the x86 frontend.....could it just be that the G5 just didn't have the same kind of out of order capabilities as the x86 designs? It would explain why despite its impressive specs it often performed far far worse, sometimes 1/3rd the performance of a top x86 cpu, perhaps if the G5's out of order capabilities weren't very advanced that these new cpus won't perform much worse per mhz.

If you are programming in a HLL the fact that the processor is in-order is irrelevant to you

That's assuming that the compiler is not only up to par, but that it likes your particular code. I'd imagine there will be a lot more quirks to learn, I believe Intel has over 100 pages of quirks in the P4 processor that can be programmed around, we might see 1000s of pages for these console cpus.

That Athlon is not, I repeat is not a decent match for the CELL and XeCPU!!! Its just NOT the same. The serious amount of flops that the CELL can provide is something that the Athlon just can't get near it.

In that case niether can the Athlon 64. If we just look at the flops, you have to have like a 16 core opteron system before you surpass the Xcpu in flops.(which I believe achieves around 116 GFlops in measured performance, while wasn't there just a post about Cell achieving around 40GFlops in real world scenarios?)

However, I'm sure he is quite upset that all the effort that went into HL2 engine is more or less useless(considering the tone of the article) for the lucrative, sinfully-delicious engine-licensing market for next-gen. Lots of sour grapes on his plate, seeing how Epic's development plans and direction with Unreal engine are now earning them the monopoly over next-gen engine licensing.

There's still a good chance revolution might be able to run Source without major modifications, even if Revolution skimps out on the graphics capabilities it'll still likely be more than able to handle what Source can do, and maybe Rev will have an easy to use cpu. BTW, who would use Source next gen anyhow? Unreal engine is really the only engine that takes advantage of the graphical capabilities of the next gen systems.

Source was amazing, costing millions I imagine, and less than a year later it's pretty much dated.

Maybe they should start focusing on selling games over Steam more then, get more small devs to pick up the Source engine and release their games over Steam. IMO Steam's distribution platform is the big deal and not the Source engine, Valve may as well just license the Unreal engine and make modifications to it.

Okay so how much would that actually cost? I assume WAY more than what Sony and MS are paying. I wasn't talking about unlimited funds here. Maybe I should have added that in. Within cost what is out today or maybe coming out within a year that could out do what the PS3 and X360 is going to do?

I think the die size of one P4 chip is about the same as the tricore x360 chip, and maybe with the much higher production levels of the P4 maybe 1 dual core Intel chip could have been substituted for the tricore heavy on flops X360 cpu. Of course, then you have the whole PC base thing that may add more costs onto the rest of the system.
 
BTW, how come the G5 cpu had so many less transistors than a P4 or Athlon 64? I doubt that the huge difference is due to the x86 frontend.....could it just be that the G5 just didn't have the same kind of out of order capabilities as the x86 designs? It would explain why despite its impressive specs it often performed far far worse, sometimes 1/3rd the performance of a top x86 cpu, perhaps if the G5's out of order capabilities weren't very advanced that these new cpus won't perform much worse per mhz.

G5 supports out of order execution.

I think the die size of one P4 chip is about the same as the tricore x360 chip, and maybe with the much higher production levels of the P4 maybe 1 dual core Intel chip could have been substituted for the tricore heavy on flops X360 cpu. Of course, then you have the whole PC base thing that may add more costs onto the rest of the system.

I vaguely remember hearing that the die size might even be smaller than the P4's die size... can't be sure of this though.
 
They have to support an almost unlimited number of configurations. So they get to spend the time console developers spend optimising on getting their code to run on everyone's system - fun!

And this is exactly why I think that console games will hold their own against PC games longer than they ever have it the past. The new design of the hardware should allow for a longer max output and software hopefully will allow devs to showcase this amazing power.


If we just look at the flops, you have to have like a 16 core opteron system before you surpass the Xcpu in flops.(which I believe achieves around 116 GFlops in measured performance, while wasn't there just a post about Cell achieving around 40GFlops in real world scenarios?)

I thought that 40 Gig number was for dp. For sp its around 218 GFlops. And don't say that theoritcal numbers don't matter because compared to the x86 designs their theoritcal numbers are WAY WAY lower.
 
mech said:
BTW, how come the G5 cpu had so many less transistors than a P4 or Athlon 64? I doubt that the huge difference is due to the x86 frontend.....could it just be that the G5 just didn't have the same kind of out of order capabilities as the x86 designs? It would explain why despite its impressive specs it often performed far far worse, sometimes 1/3rd the performance of a top x86 cpu, perhaps if the G5's out of order capabilities weren't very advanced that these new cpus won't perform much worse per mhz.

G5 supports out of order execution.

I think the die size of one P4 chip is about the same as the tricore x360 chip, and maybe with the much higher production levels of the P4 maybe 1 dual core Intel chip could have been substituted for the tricore heavy on flops X360 cpu. Of course, then you have the whole PC base thing that may add more costs onto the rest of the system.

I vaguely remember hearing that the die size might even be smaller than the P4's die size... can't be sure of this though.

Yes, the G5 supports out of order execution, but not every out of order cpu is equal in that capability. It might not be very good at it.

And I think the die size is smaller than the new P4s with the 2MB of L2 cache.

I thought that 40 Gig number was for dp. For sp its around 218 GFlops. And don't say that theoritcal numbers don't matter because compared to the x86 designs their theoritcal numbers are WAY WAY lower.

I thought the actual benched performance of cell was around 40Gflops. Which would be pretty impressive, since I don't think any x86 chips come anywhere close. What's the current top x86 single core performer? I'd imagine somewhere between 10Gflop to 20Gflop.
 
Fox5:

That 40Gflops number is with the Cell using double precision vs. single precision which it obviously has been optimized for giving it it's flops finesse.

Whether this was a peak theoretical or a measured number of double precision flops I do not remember.

edit- whops...looks like I've been beaten to the finish line yet again :)
 
May be harder to max out, but that max is much higher than the max of an OOO single core. If that weren't the case there wouldn't be a move towards multicore. OOO sacrifices peak performance for ease of development. I can't believe a single A64 will match Cell for what Cell's good at, includingthe majority of game things, especially after a few years when the devs have really started to get the hang of it. And then those skills will be instantly portable to PS4 and other Cell devices, which is very good for the future.

The change had to happen sooner or later. May as well get it over with ;)

Edit : The post I replied to disappeared :oops:
 
Didn't MS make something called XNA, doesn't the PS3 come with a free package of Aegia and Havok software, and aren't plenty of other players aboard like Epic to make this multithreaded thing easier?
Neither of those things is really true. In PS3's case, it's more of an agreement with the respective middleware providers to help push the product. It's not true that any of those packages prove to be a universally applicable solution. Hell, if you can't run simrate of at least 60 fps, NovodeX is completely useless.

XNA, btw, doesn't really exist yet. Nobody has any majorly useful tools, or anything new, but it's not like Microsoft has said XNA is complete, either.

BTW, I do think he's blowing things out of proportion when saying that we should throw all our code out the window -- it's not like render pipelines will need a radical change.

Which would be pretty impressive, since I don't think any x86 chips come anywhere close. What's the current top x86 single core performer? I'd imagine somewhere between 10Gflop to 20Gflop.
Top single-core performance for x86, I believe, is somewhere around 7 GFLOPS. P4s can do up to 4 FLOPs in one instruction using SSE ops, but you can only issue once every other cycle. So it averages out to 2 FLOPs per cycle * clock speed.

This is as opposed to 1 SPE which can do packed FMADDs, so it can do 25.6 GFLOPS peak per SPE at 3.2 GHz. Realistically, you don't do FMADDS all the time, so you'll be lucky to see 1/3 of that in practice, but that's still better than x86 single-core.

That 40Gflops number is with the Cell using double precision vs. single precision which it obviously has been optimized for giving it it's flops finesse.

Whether this was a peak theoretical or a measured number of double precision flops I do not remember.
I was under the impression that the double precision performance was much less than that. Around 27 GFLOPS, IIRC. The PPE supposedly has genuine DP scalar units, but the SPEs and the vector unit on the PPE don't, and instead have to be microcode repathed, so they end up doing a dp vector op once every 15 cycles or something like that.
 
Back
Top