Beating Emotion Engine

Status
Not open for further replies.

patroclus02

Newcomer
PS2's Emotion Engine is massive FP calculator, with peak 6GFLOPS.
I suppose this come from 150MHz clock, and 10FMACS units working in parallel (128 bit MUL + ADD per cycle would be 4 FP operation per cycle and FMAC, yielding 6GFLOPS

An Athlon 2GHZ can do a FMUL and FADD at the same using separate execution units, (in 64 bits chuncks per clock, so 2 adds and 2 products per clock) getting 2GHz * 4 FP op / cycle = 8GFLOPS

Does it mean it would beat a PS2 in floating point performance??
I don't think any PS2-quailyty game would run on a 2GHz Athlon without a graphics accelerator in software mode...
 
patroclus02 said:
PS2's Emotion Engine is massive FP calculator, with peak 6GFLOPS.
I suppose this come from 150MHz clock, and 10FMACS units working in parallel (128 bit MUL + ADD per cycle would be 4 FP operation per cycle and FMAC, yielding 6GFLOPS

An Athlon 2GHZ can do a FMUL and FADD at the same using separate execution units, (in 64 bits chuncks per clock, so 2 adds and 2 products per clock) getting 2GHz * 4 FP op / cycle = 8GFLOPS

Does it mean it would beat a PS2 in floating point performance??
I don't think any PS2-quailyty game would run on a 2GHz Athlon without a graphics accelerator in software mode...


6.2 GFLOPs peak, 300 MHz ( er, 294 MHz) clock. but hey, i like that "Emotion Engine is massive FP calculator" ....has a nice ring to it ;)

sorry i'll leave others to answer your questions, i was just about to hit the sack
 
I suppose this come from 150MHz clock, and 10FMACS units working in parallel (128 bit MUL + ADD per cycle would be 4 FP operation per cycle and FMAC, yielding 6GFLOPS
Actually not quite. Breakdown would be
10FMACs * 2FP/cycle @ 300Mhz = 6GFlops/s.
+
4FDivs * 1/7FP/cycle @ 300Mhz ~ 200MFlop/s.
=
6.2GFlop/s

At any rate, bulk of the FP power in EE is two vector streaming processors, so raw numbers don't really tell the whole story, there's problems where this setup can significantly outperform general purpose processor with equivalent FP throughput.
 
Last edited by a moderator:
patroclus02 said:
Does it mean it would beat a PS2 in floating point performance??
Yes, that is entirely within the realm of reason. The EE looks immensely strong on paper, but even experienced PS2 programmers say it's very hard to harness its power fully in reality, especially for VU0 (the simpler of the two vector processors).

I don't think any PS2-quailyty game would run on a 2GHz Athlon without a graphics accelerator in software mode...
Of course not. There's a lot more to the EE than just the GFlops it spits out. The architecture of the processor almost couldn't be more different than a monolithic traditional CPU like the Athlon. If the Athlon had to emulate the EE, it would require a LOT more horsepower than a 2GHz chip can provide to do that, even though it's a stronger floating point processor. EE architecture is so different, and with only roughly 6 2/3 as many Athlon clock cycles per EE cycle, it would most likely be totally impossible to fit in all the work required to emulate the MIPS core, the vector processors, DMAC and all the other junk in the EE.

Forget actually doing any 3D rendering on top in software; a 2GHz Athlon couldn't come anywhere close to a PS2's rendering performance using nothing but pure CPU processing to draw the graphics. Fortunately, the EE doesn't do any 3D rendering either, it has its own 3D accelerator for that.
 
10FMACs * 2FP/cycle @ 300Mhz = 6GFlops/s.
+
4FDivs * 1/7FP/cycle @ 300Mhz ~ 200MFlop/s.
=
6.2GFlop/s

I thought so. But I read somewhere that internal EE clock was 150MHz, not 300MHz, and that 300MHz was the clock rate at which cache worked, in order to fetch 2 instructions at a time.

If the Athlon had to emulate the EE, it would require a LOT more horsepower than a 2GHz chip can provide to do that, even though it's a stronger floating point processor.

No, I don't mean emulate.
Just run a game with the detail of a PS2 game.
I know FP is not everything. But Athlon has plenty of bandwidth, and lots of ram and cache. Maybe using a 3d card previos to GeForce (with no T&L hardware, and so, just aiding rendering process), an Athlon could run a game such as Soul Reaver 2.
I run this game in an Athlon 2GHz and GeForce 256 DDR just 100% smooth... so maybe my "theory" it's true
 
you'll never be able to compare any PC part to PS2 in any apples to apples setup, but i think having, say, a 2GHz athon and something in the league of a kyro2+ or Voodoo 5 would hold it's own vs a PS2, if devs could spend the time optimizing PC code like they can console code.

regardless, OS overhead and memory size/management will always make the 2 platforms difficult to compare.
 
But I read somewhere that internal EE clock was 150MHz
EE doesn't have a single "internal" clock - but for your question, all the main functional units (R5900 and VUs) are 300mhz.

in order to fetch 2 instructions at a time.
That would be the function of having 64bit bus to Instruction cache.
 
patroclus02 said:
I thought so. But I read somewhere that internal EE clock was 150MHz, not 300MHz, and that 300MHz was the clock rate at which cache worked, in order to fetch 2 instructions at a time.

Maybe on wikipedia :

"There is some confusion as to the actual clock-speed of the processor, despite it being officaly quoted at 299MHz. This is actually the speed of the bus between the execution units and the internal cache memory. The EE core actually consits of two execution units, both of which run at 149.5MHz.

Access to cache must only take a single CPU clock-cycle, so the bus must run at double the main clock-speed if one execution unit is not to be left waiting while its partner is accessing the bus. For this reason, the quoted speed of the EE is the internal bus speed and not the clock speed of the individual execution units."
 
Rootax said:
Maybe on wikipedia :

"There is some confusion as to the actual clock-speed of the processor, despite it being officaly quoted at 299MHz. This is actually the speed of the bus between the execution units and the internal cache memory. The EE core actually consits of two execution units, both of which run at 149.5MHz.

Access to cache must only take a single CPU clock-cycle, so the bus must run at double the main clock-speed if one execution unit is not to be left waiting while its partner is accessing the bus. For this reason, the quoted speed of the EE is the internal bus speed and not the clock speed of the individual execution units."


wow, this is new info for me, about a semiconductor that was completed in 1998,
and whos core clockspeed(s) that were finalized in 1999.
 
Rootax said:
Maybe on wikipedia :
Wikipedia's got it totally wrong... It's the main internal bus that runs at a half-speed 150MHz; the actual processors run at 300. Or 294, or whatever.
 
patroclus02 said:
I thought so. But I read somewhere that internal EE clock was 150MHz, not 300MHz, and that 300MHz was the clock rate at which cache worked, in order to fetch 2 instructions at a time.



No, I don't mean emulate.
Just run a game with the detail of a PS2 game.
I know FP is not everything. But Athlon has plenty of bandwidth, and lots of ram and cache. Maybe using a 3d card previos to GeForce (with no T&L hardware, and so, just aiding rendering process), an Athlon could run a game such as Soul Reaver 2.
I run this game in an Athlon 2GHz and GeForce 256 DDR just 100% smooth... so maybe my "theory" it's true

PC hardware(along with just about every console dreamcast and onwards besides ps2) has entirely different strengths from the PS2, there are things the ps2 is a beast at, yet other things where it fails in comparision to a 1998 gaming computer.

If you wanted to make some kind of comparision, I guess an Athlon processor plus a Voodoo5 6000 would be similar, but I don't think even then you'd need anywhere near a 2ghz processor to match the ps2. You can actually often even disabled hardware t&l in many semi-modern games and they will run fine without it, mainly because most games still do quite a bit of work on the cpu due to having to support non-t&l capable cards or fixed function cards that don't have a lot of capabilities.
 
If my memory doesn´t fail to me in a direct comparison of VU1 with the Nvidia T&L and programmable T&L (Vertex Shader) the theorical performance is:


PS2: 5 FMADD at 300Mhz
NV10: 2 FMADD at 120Mhz
NV15: 2 FMADD at 200Mhz
NV20: 4 FMADD at 250Mhz
NV25: 8 FMADD at 250Mhz
PSX: 2 FMADD at 33Mhz

Knowing that VU1 was finished in 1999 the EE is an impressive hardware compared to the CPU of its time thanks to its built-in Geometry Engine.
 
Guden Oden said:
Knowing that VU1 was finished in 1999 the EE is an impressive hardware compared to the CPU of its time thanks to its built-in Geometry Engine.


finished probably in 1998, architecturally. final clock and adjustments in 1999. but yeah, I agree with you. it was way ahead of its time in pure geometry transform performance.
 
Last edited by a moderator:
Urian said:
If my memory doesn´t fail to me in a direct comparison of VU1 with the Nvidia T&L and programmable T&L (Vertex Shader) the theorical performance is:


PS2: 5 FMADD at 300Mhz
NV10: 2 FMADD at 120Mhz
NV15: 2 FMADD at 200Mhz
NV20: 4 FMADD at 250Mhz
NV25: 8 FMADD at 250Mhz
PSX: 2 FMADD at 33Mhz

Knowing that VU1 was finished in 1999 the EE is an impressive hardware compared to the CPU of its time thanks to its built-in Geometry Engine.

I recall hearing that the EE was originally just supposed to feature VU0, but VU1 was added in due to developers' requests.
 
Well considering EE does all the transform and lighting duties while on PCs of 1999 you could get a P3 Coppermine (1 GHz at the end of the year) with a GeForce 256 DDR (big T&L onboard), I'd say PS2 wasn't all that mind-blowingly impressive. A console built out of that hardware would've been a formidable opponent to say the least.

Of course, PS2 wasn't designed in 1999 so it's not realistic to make these sort of comparisons. I guess.

As usual, the biggest constrasts come from the open vs. closed platform and that ability to actually utilize the hardware without software with unnecessary robustness getting in the way.
 
Last edited by a moderator:
swaaye said:
Well considering EE does all the transform and lighting duties while on PCs of 1999 you could get a P3 Coppermine (1 GHz at the end of the year) with a GeForce 256 DDR (big T&L onboard), I'd say PS2 wasn't all that mind-blowingly impressive. A console built out of that hardware would've been a formidable opponent to say the least.

Of course, PS2 wasn't designed in 1999 so it's not realistic to make these sort of comparisons. I guess.

As usual, the biggest constrasts come from the open vs. closed platform and that ability to actually utilize the hardware without software with unnecessary robustness getting in the way.


a console built around a Pentium III Coppermine and NV10 ~ GeForce 256 DDR would not have had the graphics bandwidth that PS2 had for rendering. PS2 would've had many advantages in pure performance, though the PIII + NV10 setup would've had rendering quality advantage. in fact, the Xbox was originally reportedly (in 1999) going to be a P3 or Athlon with NV10 ~ GeForce 256.

PS2's development began way back, around 1995.
 
Well NV10 isn't far behind NV2A in bandwidth. Both use DDR, the NV10 having 150 MHz 128-bit DDR instead of NV2A's 200 MHz 128-bit DDR. NV10 had a lot less need for bandwidth than those later chips though, with no 2nd TMU per pipe.

Yeah I remember seeing the initial rumors of Xbox, with the Athlon instead of the P3 and the NV10-ish GPU.

PS2's brute force "power" is awfully inefficient in-practice from what we've seen. It has taken years to get the machine really looking good, and it's still well behind 'Cube and Xbox. The machine's raw numbers don't do well against more efficient architectures.

Gamecube has basically an NV10 I'd say. Flipper has some more functionality in shader capabilities, but it's a 4x1 architecture with DX7 T&L for the most part, just like NV10. It's 162 MHz clock is a bit ahead though. NV10 was usually around 120 MHz.

I think a NV10/P3 console would be easily comparable to PS2. And like you say, it would probably look better in quality of rendering (i.e. the texture filtering quality and whatnot).
 
swaaye said:
Well NV10 isn't far behind NV2A in bandwidth. Both use DDR, the NV10 having 150 MHz 128-bit DDR instead of NV2A's 200 MHz 128-bit DDR. NV10 had a lot less need for bandwidth than those later chips though, with no 2nd TMU per pipe.

Yeah I remember seeing the initial rumors of Xbox, with the Athlon instead of the P3 and the NV10-ish GPU.

PS2's brute force "power" is awfully inefficient in-practice from what we've seen. It has taken years to get the machine really looking good, and it's still well behind 'Cube and Xbox. The machine's raw numbers don't do well against more efficient architectures.

Gamecube has basically an NV10 I'd say. Flipper has some more functionality in shader capabilities, but it's a 4x1 architecture with DX7 T&L for the most part, just like NV10. It's 162 MHz clock is a bit ahead though. NV10 was usually around 120 MHz.

I think a NV10/P3 console would be easily comparable to PS2. And like you say, it would probably look better in quality of rendering (i.e. the texture filtering quality and whatnot).


actually that's a very well thought-out post, Swaaye. one that I cannot really disagree with. I guess, while designing the PS2 in the mid to late 1990s, SCEI saw bandwidth, peak pixel fillrates, peak polygon rates as their goal. since they designed everything in-house (GS) and the EE with Toshiba, they didn't have the ability to add more rendering features, and in the final PS2, the best-quality implementations of the rendering features already GS had (i.e. mip-mapping). in order to compete with Dreamcast and PC hardware, SCEI had go for massive floating point performance, massive bandwidth, massive peak fillrate and polygon rates. but sacrifieced efficiency and ease of programmability.

I know some devs here that have actual experience on PS2 may disagree with me. this is just my perspective from an outsider :)
 
swaaye said:
Well NV10 isn't far behind NV2A in bandwidth. Both use DDR, the NV10 having 150 MHz 128-bit DDR instead of NV2A's 200 MHz 128-bit DDR. NV10 had a lot less need for bandwidth than those later chips though, with no 2nd TMU per pipe.

Yeah I remember seeing the initial rumors of Xbox, with the Athlon instead of the P3 and the NV10-ish GPU.

PS2's brute force "power" is awfully inefficient in-practice from what we've seen. It has taken years to get the machine really looking good, and it's still well behind 'Cube and Xbox. The machine's raw numbers don't do well against more efficient architectures.

Gamecube has basically an NV10 I'd say. Flipper has some more functionality in shader capabilities, but it's a 4x1 architecture with DX7 T&L for the most part, just like NV10. It's 162 MHz clock is a bit ahead though. NV10 was usually around 120 MHz.

I think a NV10/P3 console would be easily comparable to PS2. And like you say, it would probably look better in quality of rendering (i.e. the texture filtering quality and whatnot).

I believe gamecube gets a bit better performance than nv10 per mhz due to a lower hit from enabling certain features. Gamecube certainly seems more efficient than its specs indicate, its raw specs indicate it being closer to a dreamcast level of performance.
 
Status
Not open for further replies.
Back
Top