John Carmarck bothered with Next gen MProcessor Consoles

jvd said:
I'm sure that before the end of the year amd will put out the a64 non fx dual system boards .

I wouldn't mind building an a64 3000+ dual . Would be sweet


That would surely help my rendering times. Gosh Maya can be such a bitch... ;)
 
Simon F said:
Entropy said:
I have to say I'm a bit disappointed in the attitude of some programmers. I've seen science graduate students just buckle up and attack the problem of wringing optimal performance out of quite gnarly architectures.
But he was also talking about the problems of the development cycles increasing and of the spiralling costs. Making the HW harder to use doesn't help matters. I guess it all depends on how good the development tools are in the first years before and after launch - usually these start out as "less than ideal".

All quite true.
Obviously acceptance, quality of early products, and speed(cost) of development will depend on tool quality and availability.

On a personal note though, I'm generally quite sceptical of using software tools to try to make a multiprocessor appear to be a classical CPU target. Autoparallellizers and autovectorizers have their place, but generally I feel that explicitly taking parallellism into account in the code is the way to go forward. The single monolithic CPU has been the default target up until now, but at this point in time and onwards, fitting multiple CPUs on a single die is not only possible but starts to make sense. Going forward to progressively finer lithographic techniques, the case for multiprocessing will just get stronger. PCs, with their focus on backwards compatibility, is likely to be an area which will benefit relatively little by going much beyond four, or even two processors for the forseeable future. Consoles however don't have to move so slowly architecturally. Their main limitation lies in cost.
 
Guden Oden said:
Now, there's nothing wrong with the guy's intelligence, he's become immortal through programming techniques such as "Carmack's Reverse" and such, and is a veritable rocket scientist to boot, but the guy is LAZY.

Thus creating a whole new definition of "lazy." :oops:

Even if you use assembly, you only use it for the few code blocks which are executed the most commonly. Seldomly used modules really don't need that much optimization (Amdahl's equation) - Doom's asm routines concerned sprite and wall rasterization, which are most certainly the procedures with the highest execution frequency.

It's also getting harder and harder even for skilled asm programmers to consistently beat the best C compilers. It's also extremely tedious. Compare this:
Code:
a = strcmp(str1, str2);

With this:

Code:
_strcmp proc  // return parameter in ah 
pushad
lea esi, str1
lea edi, str2
loop_again:
inc esi
inc edi
mov al, byte ptr [esi]
mov ah, byte ptr [edi]
cmp al, ah
ja abov
jb belo
cmp al, '$'
jne loop_again
popad
xor ah, ah
ret
abov:
popad
mov ah, 1
ret
belo:
popad
mov ah, -1
ret
_strcmp endp

The saddest aspect is that the string.h C function is probably a lot faster than mine. There's block string instructions on the x86, but they tend to inflexible.

Edit: Three errors in the assembly code fixed. I'm sure there's more.
Edit 2: One more fixed.
 
It's also getting harder and harder even for skilled asm programmers to consistently beat the best C compilers. It's also extremely tedious. Compare this:
That's not even really tedious yet :p Try writting inline asm quicksort once (especially with the moronic GCC intricts that don't allow register naming - it gets beyond painfull trying to read code using 32 registers named "%x" -_- ).

To be fair though, in C++ you can vastly improve compiler optimization capabilities without ever touching ASM thanks to metaprogramming - not that I'm saying those kind of optimizations are actually easy or not tedious though.
 
akira888 said:
It's also getting harder and harder even for skilled asm programmers to consistently beat the best C compilers. It's also extremely tedious. Compare this:
Code:
a = strcmp(str1, str2);

With this:

Code:
_strcmp proc  // return parameter in ah 
pushad
lea esi, str1
lea edi, str2
loop_again:
inc esi
inc edi
mov al, byte ptr [esi]
mov ah, byte ptr [edi]
cmp al, ah
ja abov
jb belo
cmp al, '$'
jne loop_again
popad
xor ah, ah
ret
abov:
popad
mov ah, 1
ret
belo:
popad
mov ah, -1
ret
_strcmp endp

The saddest aspect is that the string.h C function is probably a lot faster than mine. There's block string instructions on the x86, but they tend to inflexible.

akira,

essentially your example compares a function call to a function definition as there's no built-in strcmp operator in c ;)
 
Fafalada said:
That's not even really tedious yet :p Try writting inline asm quicksort once (especially with the moronic GCC intricts that don't allow register naming - it gets beyond painfull trying to read code using 32 registers named "%x" -_- ).

beg your pardon, gcc's inline asm is far from moronic, as it actually works _with_ the cc optimiser/backend, rather than _against_ it (which is the case with the allmighty, "developer's dream" vc)
 
Yes, that was more of an example than anything. :) "switch" is in every core C specification however.

Code:
switch(functionid){
  1: r=a+b; break;
  2: r=a-b; break;
  3: rlong=(long)(a*b); break;
  4: r=a/b; break;}
Code:
mov ax, functionid
cmp ax, 2
jae three_or_more
cmp ax, 2
je two_here
mov ax, a
add ax, b
mov r, ax
jmp endh
two_here:
mov ax, a
sub ax, b
mov r, ax
jmp endh
three_or_more:
cmp ax, 4
je four_here
mov ax, a
cwd
mul b
mov bx, dx
shl ebx, 16
mov bx, ax
mov r_long, ebx
jmp endh
four_here:
mov ax, a
cwd
div b
mov r, ax
endh:
....
 
Dio said:
Simon F said:
it's the old adage "You can lead a programmer to water but you really have to hold his head down for a long time."
I want that on my wall.
Well, before you print it off, let me adjust it slightly...

"You can lead an ISV to water but you really have to hold his head down for a long time."

Of course, this does not reflect any opinion of any companies I may be involved with :)
 
akira888 said:
Yes, that was more of an example than anything.

you're still missing the point - your example does not serve its purpose - by comparing a function call to a function definition from two different languages you don't show/prove anything. see?

ed: ok, i see you got my point :)
 
beg your pardon, gcc's inline asm is far from moronic, as it actually works _with_ the cc optimiser/backend, rather than _against_ it (which is the case with the allmighty, "developer's dream" vc)
:) I admit, I was being harsh, but only because I care. I actually prefer GCCs inline asm to all those I've used on wintel.
The naming thing just frustrated me because I don't think it would be a big deal to allow replacing those %s with custom strings. I eventually got around to using preprocessor for naming but it's a hassle I could live without.
 
CELL is a product of an analog electrical engineer who is clueless about computer architecture and a computer hardware architect who doesn't give a shit about what programmers say.

Stupid me that saw those 5 IBM fellows with tons of programming experience credited of having defined the CELL architecture.

I am sure the analog engineer you are talking about is a high level Sony official, right ?

You come to this thread with all the good intentions, right Deadmeat ? I must be immagining all the name-calling and groundless accusations you make.

Since I am not allowed to post the clown pic+midi pending the locking of the thread as a result then I will just think about it in my head while re-reading your post, you undepletable source of amusement (TM by darkblu).
 
akira888 said:
Thus creating a whole new definition of "lazy." :oops:

Yeah, haha, I admit it is pretty novel :LOL: but really, he IS, if you think about it.

In order for a monolithic PC architecture to reach the theoretical performance of a multiprocessor system it needs the MOTHER of all CPUs, and performance does NOT scale linearly with either clock speed or transistor useage either, so it's not as if that chip is going to be very efficient either. And definitely not CHEAP, easy to cool, etc etc etc.

Truth is, the days of single-CPU systems will soon be numbered, being replaced by either physical or virtual multi-processors at an ever increasing speed. A while after that, systems with single virtual MP CPUs will become extinct too.

Carmack better start preparing NOW, because the days of the PC as we know it are numbered. Legacy hardware is going, though at a slow pace. Unfortunately it seems PCI will be gone before the age-old serial port, which dates back to the later half of the 60s, and one day there won't be a parallel port, floppy interface or PS/2 connectors either. Still, PCs won't be as optimized as consoles, ever.

I count to almost 50 processes of various types running in my own PC, and that does not include the number of sub-threads in each of those processes. A console won't be bogged down with all that garbage.

Multiprocessing is the easiest way to reach massive performance without having to pay a massive amount of money, complaining that programming such systems won't be easy doesn't help because this change is going to HAPPEN wether any particular programmer wants it or not. Efforts are better spent researching different approaches that works in a multiprocessor environment.

It's also getting harder and harder even for skilled asm programmers to consistently beat the best C compilers. It's also extremely tedious.

This I never disputed... Quite the opposite. ;)
 
Guden Oden said:
I count to almost 50 processes of various types running in my own PC, and that does not include the number of sub-threads in each of those processes.


Then you need to start learning how to use a PC... ;) Or don't complain about it ;) j/k
 
london-boy said:
Guden Oden said:
I count to almost 50 processes of various types running in my own PC

Then you need to start learning how to use a PC... ;) Or don't complain about it ;) j/k

Haha! Well, actually I didn't complain, I just made an observation. And the reason there's so much running in my PC is I have lots of stuff connected to my PC (like 10 USB devices, all 5 PCI slots filled, utility programs etc - my systray typically has 14+ icons in it).

So, it's not that I don't know how to use a PC - I am actually a power user. ;) I do need more RAM though - 512MB isn't enough these days. XP is really nice, but SUCH a RAM-hog.
 
Guden Oden said:
london-boy said:
Guden Oden said:
I count to almost 50 processes of various types running in my own PC

Then you need to start learning how to use a PC... ;) Or don't complain about it ;) j/k

Haha! Well, actually I didn't complain, I just made an observation. And the reason there's so much running in my PC is I have lots of stuff connected to my PC (like 10 USB devices, all 5 PCI slots filled, utility programs etc - my systray typically has 14+ icons in it).

So, it's not that I don't know how to use a PC - I am actually a power user. ;) I do need more RAM though - 512MB isn't enough these days. XP is really nice, but SUCH a RAM-hog.

Off topic, but i'm still "floating" very comfortably with my 1GB Ram, like, there's always around 700MB FREE, when i'm not doing much (like watching a video or something)...

1GB RAM in next gen consoles would be total bliss (for the time being)
 
Bjorn said:
And then you have to take into consideration that he is one of the few that has actually supported multiple CPU's in his engines.

He took his engine written for serial implementation and split it up at a natural point so he could introduce pipelining. Pipelining is a very limited form of parallelism though. He never pretended he wrote his engine to be remotely optimal for a SMP machine.
 
Guden Oden said:
I count to almost 50 processes of various types running in my own PC, and that does not include the number of sub-threads in each of those processes. A console won't be bogged down with all that garbage.
How many of them simultaneously use more than 0.1% of your CPU time? :)

I ran a 2-CPU box as my main system for about 18 months. It was great to be able to compile and play Unreal Tournament simultaneously, but that was the only benefit I saw. I got more 'general' performance improvement overclocking the CPU 10% than putting the second CPU in.

Multiprocessing is the easiest way to reach massive performance without having to pay a massive amount of money, complaining that programming such systems won't be easy doesn't help because this change is going to HAPPEN wether any particular programmer wants it or not. Efforts are better spent researching different approaches that works in a multiprocessor environment.
JC, alone AFAIK amongst game developers, made an effort and did a reasonable job, getting a 20-40% speedup (on the aforementioned system) by a clever trick using the way the internals of the Q3 engine worked.

The problem is that multiprocessor performance isn't a solved issue. It's not possible to say 'it's the cheapest way to reach massive performance' because it's only theoretical performance. The theoretical performance has (up until now) only been reached in a reasonably limited set of situations.

It's also worth noting that multiprocessor architectures have to be considered as a whole system. It's not just '2 of this CPU' because you need to consider communication, synchronisation etc. The PC model for CPU-CPU comms (shared memory, mutexes, MESI caches) is dreadful for a console-type environment. Effectively the two cannot pipeline-process data - the system's only really designed to work on separate processes rather than different parts of the same thing. Personally I don't think a Transputer-like architecture (no shared memory, message passing) is much of a solution either.

One presumes a console architecture relying on multiprocessing will make more effort in this direction.
 
Back
Top