About some info from XB2 leaked documents.

Panajev2001a · Dec 1, 2004

I agree Gubbi with your analisys here and I wanted to add that I see as a major problem, especially in PlayStation 2 games programming, the fact that often quite a bit of C++ code has to be re-written in R5900 friendly ASM because GCC is nowhere near perfect at optimizing code and hide main RAM latency (no L2 cache and tiny 8 KB L1 D-Cache): more than one developer puts having a more C/C++ friendly main CPU (or PU

) to a higher priority than fancy ultra flexible PS 3.xx-PS 4.xx shaders.

OOOe and nicely sized and designed L1 and L2 caches do quite a bit to make a CPU more C/C++ friendly if you catch my drift.

passerby · Dec 2, 2004

Thanks to everyone who posted.

So it does appear that CPUs for next-gen consoles may have pretty insane clockspeeds and performance. As a sidenote, I can forsee this having an impact on regular PC-upgraders when next-gen consoles launch.

darkblu · Dec 2, 2004

Gubbi,

my memory may be failing me regarding the conception of the SMT acronym, so it may have indeed started out as simultaneous mt. and definitely from a certain moment on it has implied more concurrency rather than symmetry. OTH

Gubbi said:
The only way you get symmetrical multithreading is if you replicate everything.

replicating pipelines still does not mean you have to have separate, complete PUs on a die. cache, for an instance, does not need to be replicated across units. plus replication is just one way. there are other, well-known ways to achieve more threading symmetry. arguably, of a somewhat less simultaneous nature.

When you do that on one die it's called CMP (chip multi processing), examples are Power 4/5 and upcoming multicore chip from Intel and AMD. If it's not on one die it's called SMP, symmetrical multi processing.

and surprisingly, you can do symmetric multithreading on a single processing units just as well. and it's been around for quite some time now. google for Tera.

The only multi threading CPU preceding the P4 is IBM's Northstar which was used in their AS/400 product line, and that was a switch-on-event (event being a level 2 cache miss), IBM dubbed that DMT (dual multi threading).

sun have had switch-on-even designs for quite some time. can't recall which of those actually reached production and when exactly.

So SMT has always meant simultaneous multi threading; The S in SMT indicating that a CPU can have instructions from different thread contexts in the same stage in the pipeline simultaneously. This is the way Intel uses it in the P4 documentation and it is the way it was orginally disclosed in the Alpha EV8 descriptions.

oh, it surely is the way intel uses it, no questions here ; ) can't say about the alpha- never been into it.

Fafalada · Dec 2, 2004

In the current generation Xbox has an ooo CPU, while PS2 has an in-order CPU. So PS2 game developers have to think about instruction ordering much more than Xbox devs do.

On CPU not really, unless you write massive amounts of asm code - and that just isn't really done anymore. It's more of an issue that compiler has to 'think' about instruction ordering much more - and unfortunately older versions of GCC suck pretty badly when it comes to that.

But thing is that by far the greatest bane of PS2 CPU isn't the execution order, it's the teeny weeny D-cache (even PSP has more D-Cache for crying out loud and that's a freaking portable).
And frankly I don't see OOE doing much to solve that either - though that poses an interesting question - would 256K of L2 cache take more die space then reengineering the R59k to have OOE?

Panajev said:
more than one developer puts having a more C/C++ friendly main CPU (or PU Wink) to a higher priority than fancy ultra flexible PS 3.xx-PS 4.xx shaders.

It's not just a matter of priorities - it's the fact that all the fancy ultra flexible shader units will do you little good if your game always ends horribly limited by a crippled CPU core.
And in the era when PC porting is more frequent then ever that issue is also more important then ever because you frequently end up with codebases that are CPU bound even on good CPU platforms.

Megadrive1988 · Dec 2, 2004

this discussion has gone quite a bit over my head

rabidrabbit · Dec 2, 2004

Me too, they should be banned.

Gubbi · Dec 2, 2004

darkblu said:
replicating pipelines still does not mean you have to have separate, complete PUs on a die. cache, for an instance, does not need to be replicated across units. plus replication is just one way. there are other, well-known ways to achieve more threading symmetry. arguably, of a somewhat less simultaneous nature.

But in modern CPUs you don't really have a concept of a pipeline. Each stage in the CPU is decoupled from the next, typically with varying instruction throughput capacity.

darkblu said:
and surprisingly, you can do symmetric multithreading on a single processing units just as well. and it's been around for quite some time now. google for Tera.

Sorry, I'd forgotten about Tera, you are of course correct about it being symmentrical. However in my defense Tera themselves calls their CPU the Tera MT

darkblu said:
sun have had switch-on-even designs for quite some time. can't recall which of those actually reached production and when exactly.

MAJC? I'm pretty confident that the Northstar precedes it though, it's from 1998.

Cheers
Gubbi

Gubbi · Dec 2, 2004

Fafalada said:
And frankly I don't see OOE doing much to solve that either - though that poses an interesting question - would 256K of L2 cache take more die space then reengineering the R59k to have OOE?

Well the R5900 is running at 300MHz or with a 3ns cycle time. Main memory latency is what ? 90ns ? (I'm guessing from the on-die Rambus controller of the Alpha EV7, since I can't find anything on the PS2 memory system). That means that main memory is 30 cycles away or similar to the apparent level 2 cache latency of next gen console (my estimate for the Xenon CPU), and should be within the capabilities of a OOOE CPU to hide almost completely.

And yes, a 256KB level 2 cache would take up more room than a ROB (look at the first P3s).

Cheers
Gubbi

aaronspink · Dec 2, 2004

darkblu said:
historical flashback: the acronym SMT originated as Symmetrical Multi-Threading. some cpu vendors, though, whose multi-threading cores were not quit symmetrical, changed the acronym into "Simultaneous Multi-Threading" (eg. intel's HyperThreading is "simultaneous mt"). A true SMT system in the sense of symmetrical mt system should behave identically (or very close to) an SMP system, i.e. it should be able to carry out multiple threads w/o latter blocking each other through (implicitly) "mutexed" cpu resources.

Historical reality: SMT stands for Simultaneous Multi-Threading. There is also CMT (Concurant Multi-Threading) and SOEMT (Switch-On-Event Multi-Threading).

You are confusing the acronym SMP with SMT which are different things. SMP stands for Symmetrical Multi-Processor. ASMP is Asymmetrical Multi-Processor (PS2 is an example of an ASMP system as were early MP Apple systems). ASMP and SMP shouldn't be confused with NUMA/UMA which are orthoganal and describe the structure of the memory system while ASMP/SMP describle the structure of the processoring elements in relation to their capabilities.

Aaron Spink
speaking for myself inc.

aaronspink · Dec 2, 2004

Gubbi said:
The only multi threading CPU preceding the P4 is IBM's Northstar which was used in their AS/400 product line, and that was a switch-on-event (event being a level 2 cache miss), IBM dubbed that DMT (dual multi threading).

You left out the HEP/Tera lineage which is also MT but CMT instead of SOEMT or SMT. The HEP and Tera both predated P4 and HEP predated N*. Both HEP and Tera were aimed at the HPTC portion of the market.

Aaron Spink
speaking for myself inc.

Fafalada · Dec 3, 2004

Gubbi said:
Well the R5900 is running at 300MHz or with a 3ns cycle time. Main memory latency is what ? 90ns ? (I'm guessing from the on-die Rambus controller of the Alpha EV7, since I can't find anything on the PS2 memory system). That means that main memory is 30 cycles away or similar to the apparent level 2 cache latency of next gen console (my estimate for the Xenon CPU), and should be within the capabilities of a OOOE CPU to hide almost completely.

While that's true (~32cycles for a cache miss) things get a whole lot worse when you factor in bus contentions - this is still a hybrid UMA system, and CPU is the last device that gets access usually.
Also - 30cycle cache miss may not sound like much - but when you get more then half a milion of them per frame, that suddenly becomes a whole lot...

Not that I'm saying OOOE wouldn't help - but it's still just a bandaid on effectively broken cache scheme (I mean c'mon - 8K of 2-way DCache in a 300mhz cpu is seriously stretching things).

Gubbi · Dec 3, 2004

Fafalada said:
Gubbi said:

Well the R5900 is running at 300MHz or with a 3ns cycle time. Main memory latency is what ? 90ns ? (I'm guessing from the on-die Rambus controller of the Alpha EV7, since I can't find anything on the PS2 memory system). That means that main memory is 30 cycles away or similar to the apparent level 2 cache latency of next gen console (my estimate for the Xenon CPU), and should be within the capabilities of a OOOE CPU to hide almost completely.

Click to expand...

While that's true (~32cycles for a cache miss) things get a whole lot worse when you factor in bus contentions - this is still a hybrid UMA system, and CPU is the last device that gets access usually.
Also - 30cycle cache miss may not sound like much - but when you get more then half a milion of them per frame, that suddenly becomes a whole lot...

True. But it is exactly this kind of situation where OOOE helps you the most. Latency in the low tens of cycles, the fact that it varies just makes hand/static scheduling that much harder (and hence increase the benefir of a having a self-scheduling device).

Fafalada said:
Not that I'm saying OOOE wouldn't help - but it's still just a bandaid on effectively broken cache scheme (I mean c'mon - 8K of 2-way DCache in a 300mhz cpu is seriously stretching things).

That is certainly true. I was not saying that OOOE would solve the R5900 memory problems, just that covering 30 cycles of latency is possible with a OOOE processor. Again it's about picking the lowest hanging fruit first.

I wonder how much of a transistor diet, the R5900+caches went through prior to the finalization of the PS2 spec. Even a 64KB level2 cache would have helped a fair amount.

Cheers
Gubbi

Fafalada · Dec 3, 2004

Gubbi said:
I wonder how much of a transistor diet, the R5900+caches went through prior to the finalization of the PS2 spec. Even a 64KB level2 cache would have helped a fair amount.

That's a good question - although it may have been a result of oldschool console design mindset(which in retrospect would be nearsighted), PS2 is very much like DC in regards to CPU core (tiny cache, no OOOE) and obviously generation before was all like that.
It wasn't until XBox and GC that consoles started to move towards stronger general purpose cpu performance...

darkblu · Dec 3, 2004

aaronspink said:
darkblu said:

historical flashback: the acronym SMT originated as Symmetrical Multi-Threading. some cpu vendors, though, whose multi-threading cores were not quit symmetrical, changed the acronym into "Simultaneous Multi-Threading" (eg. intel's HyperThreading is "simultaneous mt"). A true SMT system in the sense of symmetrical mt system should behave identically (or very close to) an SMP system, i.e. it should be able to carry out multiple threads w/o latter blocking each other through (implicitly) "mutexed" cpu resources.

Click to expand...

Historical reality: SMT stands for Simultaneous Multi-Threading. There is also CMT (Concurant Multi-Threading) and SOEMT (Switch-On-Event Multi-Threading).

just a few posts above I concurred with Gubbi on the original significance of the S in the SMT acronym - yes, it most likely originated as 'simultaneous' - so my bad. nevertheless, symmetrical multi-threading is just as actual term, with this specific emphasis on symmetry. as such it has been known to be used to describe cpu architectures now and then. so to right the wrong of my original post once and for ever:

does the acronym SMT stand for 'simultaneous multi-threading' - yes.
does it imply threading symmetry - hardly.
do single-CPU architectures exist, known specifically for their "symmetrical multi-threading" - yes.
have people been known to interpret S in SMT to signify 'symmetrical' rather than 'simultaneous' - occasionally (and that's not just me on this occasion).

You are confusing the acronym SMP with SMT which are different things. SMP stands for Symmetrical Multi-Processor. ASMP is Asymmetrical Multi-Processor (PS2 is an example of an ASMP system as were early MP Apple systems). ASMP and SMP shouldn't be confused with NUMA/UMA which are orthoganal and describe the structure of the memory system while ASMP/SMP describle the structure of the processoring elements in relation to their capabilities.

nope, if you paid attention to my original post you'd have seen that i'm definitely not confusing SMT and SMP acronyms, otherwise i wouldn't have compared them. what i did confuse was the origin of the S in the SMT, as originally introduced in the academia/industry, and that came partly as a memory fault on my part, partly as an intel rant (i'm not very fond of their HT, particularly for its asymmetry). nevertheless, i was aware of the 'simultaneous' interpretation of the SMT as well, just had a different (apparently erroneous) idea of its original meaning.

aaronspink · Dec 3, 2004

darkblu said:
[*] do single-CPU architectures exist, known specifically for their "symmetrical multi-threading" - yes.

Name one?
Threaded processors:
HEP/Tera are both Barrel/CMT
P4 is SMT
EV8 was going to be SMT
N* was SOEMT
MAJC was/is CMT

darkblu · Dec 3, 2004

aaronspink said:
darkblu said:

[*] do single-CPU architectures exist, known specifically for their "symmetrical multi-threading" - yes.

Click to expand...

Name one?
Threaded processors:
HEP/Tera are both Barrel/CMT
P4 is SMT
EV8 was going to be SMT
N* was SOEMT
MAJC was/is CMT

any c-slow -employing arhitecture, for example. not to mention any core-multiplicity architectures. and Tera's "concurrent" mt is fairly-symmetrical alright.

About some info from XB2 leaked documents.

Panajev2001a

passerby

darkblu

Fafalada

Megadrive1988

rabidrabbit

A Reformed Member

Gubbi

Gubbi

aaronspink

aaronspink

Fafalada

Gubbi

Fafalada

darkblu

aaronspink

darkblu

Similar threads