John Carmarck bothered with Next gen MProcessor Consoles

Dio · Mar 26, 2004

MfA said:
He never pretended he wrote his engine to be remotely optimal for a SMP machine.

It was a very good way of doing it given the architecture of the Q3 engine and the limitations of the PC environment. It is a good question as to what would have been done differently with a 'SMP engine'.

MfA · Mar 26, 2004

Simon F said:
OTOH, it is relatively easy for HW designers to bung down multiple cores and say that they have a high performance part. I would say that JC is lamenting this direction.

He is in no position to judge if there is an alternative ... he cannot know to the same extent the console designers know what the peak performance difference would have been between a serial and a parallel solution of the same cost. Obviously there will always be a crossover point where argueing in favour of serial execution becomes ludicrous ... say preferring 1% improvement in serial execution speed over a 1% decrease combined with a 200% improvement in peak performance is not realistic.

His opinion is important, but the console manufacturers have good sw designers too ... but they have both sw designers and hw designers, both with a lot more information than Carmack has.

Marco

PS. message passing does not prohibit the use of shared memory as a way of communicating data, messages can contain references (there are even ways of maintaining CSP purity with reference passing, if you want that).

akira888 · Mar 26, 2004

Tuttle said:
What a dope.

Your attitude is getting increasingly annoying. I made a really stupid mistake; everyone does.

By the way, there will apparently be a successor to the Xbox, sorry. I could make a witty quip but I'll just let your prediction stand or fall on its own.

Dio · Mar 26, 2004

MfA said:
PS. message passing does not prohibit the use of shared memory as a way of communicating data, messages can contain references (there are even ways of maintaining CSP purity with reference passing, if you want that).

Sorry - I never meant to imply that it did. I can visualise all sorts of architectures that mix the 'PC' and 'Transputer' models and there are lots of other things (NUMA, DMA engines, multiplexers, etc.) that could also be involved.

ERP · Mar 26, 2004

Fafalada said:
It's also getting harder and harder even for skilled asm programmers to consistently beat the best C compilers. It's also extremely tedious. Compare this:

Click to expand...

That's not even really tedious yet Try writting inline asm quicksort once (especially with the moronic GCC intricts that don't allow register naming - it gets beyond painfull trying to read code using 32 registers named "%x" -_- ).

To be fair though, in C++ you can vastly improve compiler optimization capabilities without ever touching ASM thanks to metaprogramming - not that I'm saying those kind of optimizations are actually easy or not tedious though.

Actually I suspect most programmers good games programmers could pretty easilly wtite assembler that beats even the best compilers if your just counting clock cycles. The problem is that 99% of problems are bounded by memory latency anyway, so even poor compilers have comparable execution times to hand written assembler.

I wrote a skinned animation system once on an N64, I wrote a C version and then rewrote it in assembler, the assembler version took less that 1/2 the clock cycles of the C version and still only out performed it by about 5% because both versions had to read all of the verts.

The other side of this is that I've doubled the execution speed of an Xbox particle system by swapping the declaration order of two variables in a structure, and adding a single prefetch instruction.

Optimisation these days is just very complicated.

ERP · Mar 26, 2004

And back on topic ...

I think the real issue with mutithreaded/processor architecture is that they are incredibly hard to debug. It is extremly easy to screw yourself with a multithreaded system, even if you know exactly what your doing.
On a team of 30 or so engineers only 5 or 6 are likely senior enough to understand the ramifications of running something on another thread. And FWIW the last game I worked on had 14+ active threads and I wouldn't wish that nightmare on anyone.

MfA · Mar 26, 2004

Did you use any automated tools for deadlock/race-condition/etc detection? If so could you just buy a suitable tool off the shelves or did you have to do it all yourself?

ERP · Mar 26, 2004

MfA said:
Did you use any automated tools for deadlock/race-condition/etc detection? If so could you just buy a suitable tool off the shelves or did you have to do it all yourself?

We were stuck with PS2 tools, and until the end of the project we couldn't even view the callstack for anything other than the running thread. So a lot of the time we'd just see the idle thread was running :/

We could debug by looking at the list of semaphores blocking the threads and back figuring what each thread was waiting on to try and work out what had happened.

Debugging on the other platforms was easier, but there were certain problems (usually graphics engine related) that would only be exposed on the PS2.

The biggest issue was the the none deterministic nature of the crashes, and how you get adequate coverage in your test plan.

Note that multithreaded engines on a ps2 are by enlarge not a good idea from a performance standpoint, but we inherited this one.

Laa-Yosh · Mar 26, 2004

Guden Oden said:
but the guy is LAZY. He quickly embraced high-level languages in games programming (DOOM from 93 had just two assembler routines spliced into it, and after that I guess there's nothing at all), because it's quicker and easier.

As far as I know, Carmack was learning assembly at that time. Then ID hired Michael Abrash, and the two of them went mad with ASM optimizations on the Quake engine, which is still one of the fastest software rasterizer on the PC platform AFAIK. Unreal 1 might have had more features, but even a P4 can't run it in high resolutions.

cthellis42 · Mar 26, 2004

Dio said:
The problem is that multiprocessor performance isn't a solved issue. It's not possible to say 'it's the cheapest way to reach massive performance' because it's only theoretical performance. The theoretical performance has (up until now) only been reached in a reasonably limited set of situations.

Hence, the earlier and the more effort developers expend on this the better, eh? Until entirely new methods are developed (diamond semiconducting? quantum computing?) and made practical, the scale of performance increases for single-chips is just looking worse and worse as time goes on--more R&D spent for less advancement... Considering the parties involved, and the future chips we're looking at, you'd think there'd be less complaining about the inevitable, and more effort spent in making future shifts work better.

Tuttle · Mar 26, 2004

The last game I worked on had every chip in the machine running full blast along with the disc streaming game data in constantly, all in parallel. We would of loved to have twice the number of chips or ten times the number. Game engines are one the easiest types of code-bases to break up into asynchronous parallel tasks.

Every single console game I've worked on has utilized every chip in the system with the only exception being vu0 on one title that was using middle-ware.

Camack is the dope. He should stick to his area of expertise, shadow and light code for low poly enclosed rooms. When he can write commercial game code that runs on a $200 piece of hardware, I'm sure the console dev community would listen to his opinions. If he wants to impress people with his critic of console hardware design, he can show his design for a single chip system that competes both in performance and manufacturing costs and that doesn't require a monopoly revenue stream to fund it.

Fafalada · Mar 26, 2004

ERP said:
Actually I suspect most programmers good games programmers could pretty easilly wtite assembler that beats even the best compilers if your just counting clock cycles.

Like I said, particulary with stuff like arithmetics classes you can do that even without ASM. Compilers are optimizing on far too narrow domain to catch most of the stuff that can be done once you, for example, move away from single register sized variables.

Optimisation these days is just very complicated.

Oh I'm alll too well aware of those issues, Particularly memory access limitations. The best gains from my asm code are usually due to the fact that I actually USE all those registers in R59k to a good effect unlike the compiler - and hence the memory dependancies are considerably less even before we start counting cycles.
Of course even that's not always enough - my last experience I had two functions (one that takes 80% of the time) and after optimizing the bigger one for speed and reducing cache misses within it to 10% of original, things still barely got any faster.
Reason? After removing cache dependancies in first, the second function ended up trashing the cache because one of the arrays it accessed was cached much better before thanks to some memory coherenency with completely different data where 1st function did all the missing before, and now it never happened anymore.
The sad part is that second function code is like 5x shorter to begin with and it still took idiotic amounts of time until I fixed it.

Anyway back on topic, that story with multithreaded app is scary - I remember you told me codebase was bad, but that's worse then I'd have thought - given the target platform.

Deadmeat · Mar 27, 2004

...

The last game I worked on had every chip in the machine running full blast along with the disc streaming game data in constantly, all in parallel.

You don't sound like a real developer.

Anyway, as bad as the current situation surrounding multi-processing console designs are, they CAN BE overcome with a proper software engineering practices. Say bye bye to multithreading, say hello to piped parallel processes. Why multithread in a single process and kill your brain cell, when you can in fact divide your software into many independent modules with a careful analysis then hook them up together via pipes? I am very much against many threads per process, but I really don't mind many single-threaded processes working togther via pipe.

Oddly enough, CELL architecture is more suited for this approach than Power5. Let's just prey that NT kernel's pipe performance is good enough to support this programming model...

freq · Mar 27, 2004

Re: ...

Deadmeat said:
The last game I worked on had every chip in the machine running full blast along with the disc streaming game data in constantly, all in parallel.

Click to expand...

You don't sound like a real developer.

Anyway, as bad as the current situation surrounding multi-processing console designs are, they CAN BE overcome with a proper software engineering practices. Say bye bye to multithreading, say hello to piped parallel processes. Why multithread in a single process and kill your brain cell, when you can in fact divide your software into many independent modules with a careful analysis then hook them up together via pipes? I am very much against many threads per process, but I really don't mind many single-threaded processes working togther via pipe.

And you sound like you heard a lot of catchwords and threw 'em out not really knowing what they mean. The two approaches aren't mutually exclusive, nor is it possible to argue that one is definitively better than the other unless you're slower than a lump of sand. Care to explain why seperate processes are better, especially on systems where memory and speed are at a premium?

Deadmeat · Mar 27, 2004

...

The two approaches aren't mutually exclusive

Mixing two defeats the purpose of going in either direction in the first place. You want simplier and easier parallelism? Then go multiprocess.

nor is it possible to argue that one is definitively better than the other

Today's computing environment does have enough memory to support multiprocess approach.

why seperate processes are better

You can code them and test them individually before piping them together. Can't do this with a multithread code. Not to mention that dreaded synchronization issue is gone....

freq · Mar 27, 2004

Re: ...

Deadmeat said:
The two approaches aren't mutually exclusive

Click to expand...

Mixing two defeats the purpose of going in either direction in the first place. You want simplier and easier parallelism? Then go multiprocess.

nor is it possible to argue that one is definitively better than the other

Click to expand...

Today's computing environment does have enough memory to support multiprocess approach.

why seperate processes are better

Click to expand...

You can code them and test them individually before piping them together. Can't do this with a multithread code. Not to mention that dreaded synchronization issue is gone....

I don't think I need to argue any further.

All's I can say is, you sound like someone who only has experience with processes and is hiding his ignorance behind made-up arguments for irrelevance.

Deadmeat · Mar 27, 2004

..

I don't think I need to argue any further.

Of course you don't. You lost.

All's I can say is, you sound like someone who only has experience with processes

You show you don't know what you are talking about... Afterall, you can't even distinguish between a process and a thread. Of course even beginning coders have experiences with processes...

Deadmeat · Mar 27, 2004

...

Anyhow, if some of you find the context switching cost of multitprocesses too heavy, then I suppose you can implement something similar in multithreaded single process. You would initiate individual threads from main, and the only form of shared memory access given to each thread would be circular queues in the main, with one thread only writing to this queue and another only reading from it. This way, a pipe-like data transfer between threads can be supported.

But this does require a careful planning and analysis. Yes, a different way of coding than what many developers are used to, but definitely doable.

Edge · Mar 27, 2004

Wow, I can't believe Carmack said those things. Just shows you that the brightest among us, say sometimes the dumbest things.

Anyway, no one is forcing anyone to write to the other CPU's if you only want to write to one, which should be powerful enough on both Xbox2/PS3. The whole point of having multiple CPU's is more power. That's never a bad thing.

I think CELL software design makes parallism easier than before, as it seems to be semi transparent, if the developer concentrates on writing modular code, and letting the OS handle the process distrubution. Is that not the whole point of the CELL software modules?

Megadrive1988 · Mar 27, 2004

The whole point of having multiple CPU's is more power. That's never a bad thing.

I agree.

While JC is a wiz at making PC 3D gaming engines, he is not the most influential person in the console industry. He doesn't design CPU or graphics hardware, afaik.

John Carmarck bothered with Next gen MProcessor Consoles

Dio

MfA

akira888

Dio

ERP

ERP

MfA

ERP

Laa-Yosh

I can has custom title?

cthellis42

Hoopy Frood

Tuttle

Fafalada

Deadmeat

freq

Deadmeat

freq

Deadmeat

Deadmeat

Edge

Megadrive1988

Similar threads