The PLAYSTATIOn3 and the game industry !!! Part1...

fouad · Jul 11, 2005

I think its time for my article to appear in earth on the internet on this great forum ( Iâ€™ve seen and followed a lot of forums but beyond3d forum is one of the best most serious and enjoyable forums on the net ! )
I heard and read a lot of crap, misunderstanding, misleading analysis, false statements, lies, hypocrisiesâ€¦etc, and I just cant continue only reading so much crap without doing anything. So I decided to create this article based on a lot of sources, relying on what logic analysis agree with. And I hope people will enjoy it, and will finally hear, understand and enjoy the truth !

1/ PS3 design :

A/ Why SONY would spend more than 5 years of development, collaborating with Toshiba and IBM one of the best companies in semiconductors and supercomputers technology in the world , and spending more than 3 billion $ (2 billion $ of investment on R&D and more than 1 billion $ in building manufactories), using the most intelligent and genius computer scientists in the world to develop the CELL ?!!!! WHY ?!!!! why after all this the CELL is only TWICE more powerful than the XeCPU in floating point calculations and is less powerful than the XeCPU on general purpose integer calculations ?!!!! Is sony (ken kutaragi and his team ) crazy ?!! Stupid ?!! Wasting time and money on something not worth it ?!! though they could just do like Microsoft (spending less than 1 billion $ on XeCPU ) and by spring 2006 getting a CPU a little more powerful than XeCPU but more efficient and less expensive than CELL?!!...OR is there something a lot of people didnâ€™t understand about CELL ?!!! is the CELL more than a stupid investment from sony ?!!...

Answer : YES of course . The CELL is more than a stupid idea. Because when I read what a lot of people say in the Internet, I got the idea of them thinking the CELL is a stupid idea from sony and its only marketing ! So I understand how people unbelievably underestimate the importance of CELL and its power for PS3, and how they donâ€™t understand why 5 years of development and 3 billion $ were spent on developing it. I will try to answer the question in 5 points :

1.1/ The power of the CELL :

Its true that CELL is only twice more powerful than XeCPU in terms of floating point calculations ( the CELL of PS3 has one PPE plus 7 working SPEs running at 3.2 GHZ), and that CELL is less powerful in general purpose integer calculations ( If we suppose that the 1 PPU is more or less almost as powerful as one of the 3 cores of the XeCPU, although this is of course not accurate ). BUT you have to know that floating point calculations are more important in multimedia and video games than integer general purpose calculations. Why ?!!
The definition of FLOPS by the internet encyclopedia wikepedia is : â€œShort for floating-point operations per second, a common benchmark measurement for rating the speed of microprocessors. Floating-point operations include any operations that involve fractional numbers. Such operations, which take much longer to compute than integer operations, occur often in some applications. â€œ
Source : (wikepedia )
http://www.webopedia.com/TERM/F/FLOPS.html

Floating point calculations are the way to go if a developer want to create a very complicated specialized physics engine and AI engine, animations, simulationsâ€¦etc. CELL is not a PC where you run 5 MS WORD windows, with 3 excel windows, and playing media player mp3 music in the background, with 10 internet explorer windows, and other programs to download games and movies from internetâ€¦etc. All this in the same time, so you need a lot of RAM, a big HDD, and a lot of general purpose integer calculations. NO ! the CELL is not designed for this kind of work or applications, but its designed like the emotion engine of ps2 ( the cell at this point of time is even more ambitious than the EE at its time ) mainly for multimedia and video games, those applications doesnâ€™t need a lot of integer general purpose calculations capabilities, but a lot of floating point calculations. SO sony is not foolish to spend 3 billion $ and 5 years of development collaborating with the best semi conductor companies in the world ( Toshiba and IBM ) to create a CPU that is a monster in integer calculations, and not so good on floating point calculations. Sony concentrated mainly on floating point calculations because the main purpose of CELL is to run games not MS office applications.

Want proof that FLOPS are the way to go to create physics, AI, complicated animations, complicated simulations ? OK, I will give 2 main proofs :

1/ The emotion engine of PS2 ! this is the best proof, because if you want to wonder how a CPU created in 1999, did almost everything the GPU+CPU of xbox1 did when running multi platform games ( gameplay, physics, AI, particle effects, vertices, any visual effects you see on screenâ€¦etc ) to the point where you could consider the GPU of ps2 ( GS the graphics synthesizer ) as a ROP unit only + textures processing ! you wonder how ?!!! the answer is : the flops capabilities of EE ( 6.2 GFLOPS compared to 1.4 GFLOPS of semi P3-celeron xbox1 CPU, and this is why the EE is far more powerful than the xbox CPU )
2/ Second proof, if you still donâ€™t believe me that FLOPS are the way to go to create complicated physics, than look at AGEIA which intend to release a PPU ( physics processing unit ) using mainly FLOPS !

Also if saying TWICE more powerful is not impressive for you. Than you have to think about it this way : the XeCPU is already unbelievably powerful ( believe it or not but you have to know that the most powerful Intel or AMD CPU in the world that you could buy for your home use does no more than 10 GFLOPS/S ) with the 115 GFLOPS/S of the XeCPU this is just unbelievably powerful. so if the CELL is twice more powerful than a CPU that is already unbelievably powerfulâ€¦this is not less than revolutionary. And you have to imagine the effort and money and time for MS and IBM to spend if they want to make the XeCPU even more powerful, ask them how much time they will need to make their CPU more powerful than the CELL in floating point calculations ; they will answer : this will take a lot of time !
Because as you know passing from 100 GFLOPS to 200 GFLOPS is a lot more difficult than passing from 6 GFLOPS to 12 GFLOPS ( which is also twice more power ) . Also passing from 100 to 200, means CELL could execute 100 more GFLOPS/S than XeCPU !!! its 100 more GFLOPS/s more`â€¦its not like passing from 6 to 12 ( 6 GFLOPS/s more power ) NO. Its 100 GFLOPS/s more.
Ask Intel for example, how much difficult and how much time they took to double the power of a P4 2 GHZ ! ( this took them 3 years ! from 2002 ( 2 GHZ )to beginning 2005 (3.4 GHZ ) ) and ask NVIDIA why its GF6 created in 8 months was twice ( 2X ) more powerful than GF5. But the GF7 created on 12 months is only 1.5 more powerful than GF6 ! the answer is easy : P4 at 2 GHZ and GF6 were already very powerful, so doubling their power is a very difficult task and will take much more time.
So we have not to underestimate the power of the CELL, and we have to understand that the CELL wont be topped in floating point calculations power by any other PC CPU at least for 4 years. So in terms of pure power for video games the CELL is revolutionary and 3 billion$ and 5 years of development time and collaborating with IBM and Toshiba are all fully justified.

1.2/ the CELL is more than a powerful CPU :

the challenge and problems when creating the CELL were not only to create a powerful CPU, but also to create good interfaces that could beef it with bandwidth ( So they used the most advanced technology in this domain the FlexIO from Rambus) the other challenge was to create great internal
memory control with high speed bandwidth and to include on each SPE an internal memory, and making the management of the bandwidth as efficient as it could be. And if in theory the CELL is a very simple design : a PPU with a lot of SPEs. In practice its another story. Executing the design by Toshiba and IBM and sony was a very difficult task in terms of semiconductors production technology .

1.3/ CELL can connect automatically with any other CELL :

Yes, one of the main objectives of ken kutaragi in creating the CELL was to make it able to create a virtual network, by connecting many CELLs. Again in theory this is very simple, but in practice itâ€™s a different story. The CELLs could connect to each other on the same board, and work together automatically ( unlike classic P4s destined for home use from Intel where only 2 of them could work automatically together, but a lot more CELLs could work automatically together on the same board ) Also the CELLs could connect to each other on a network, so they could exchange data or even work together ! All this with a high automatic level of security.

1.4/ SONY and TOSHIBA will produce the CELL
not IBM :

If sony spent 3 billion $ in creating the CELL, it was also to create manufacturing, so this will allow sony to decrease costs in the long term, unlike Microsoft which will pay other companies to produce xecpu, and its not Microsoft which will produce directly the xecpu.

2/XBOX 360 is an unbalanced system :

Is it true that xbox 360 could do free 4x AA at 720p resolution ? Of course NOTâ€¦

When it was announced by MS that the xbox360 will have 10 MB of EDRAM at 256 GO/s, and this will allow it to do free AA at HDTV resolutions, anyone even with little knowledge about hardware, knew that 10 MB is just insufficient to do this. So everyone wondered about the truth . Fortunately the beyond3d article by Dave BAUMANN about xbox360 graphics clarified all this and revealed the truth : There is 10 MB of eDRAM on xbox360 which is not sufficient to do 4X AA at 720P and 1080i resolutions. So the solution was tile rendering, (as the article of Dave Baumann said : the solution is to divide the screen into multiple portions that fit within the eDRAM render buffer space) as you can see requiring a tile rendering technique to do anti aliasing, doesent make it free anti aliasing . Because rendering consume power, so there is no real HDTV free AA on xbox 360. Furthermore in the article of dave we have the following (ATI have been quoted as suggesting that 720p resolutions with 4x FSAA, which would require three tiles, has about 95% of the performance of 2x FSAA.) Do you know what does this mean ?!!
This means that if what they are saying is true, so passing from 2x AA at 720P to 4x AA at the same resolution will hit the performance of xbox 360 by 5% ! and if we suggest this same rate, than going from no AA at 720P ( no tiling required) to 2x AA ( 2 tiles required ) will hit the performance of xbox 360 by at least 5 % or more !!! meaning that at a minimum passing from no AA at 720P to 4x AA at 720P will hit the performance of xbox360 by at least 10 % !!! Now where is the free AA here ?!!
Do the same analysis for 4x AA at 1080i and you will conclude that at least to pass from no AA at 1080i ( no tiling needed ) to 4x AA at 1080i ( 4 tiles needed ) you will get at least 15 % hit in performance !!!
And if Microsoft consider this as free AA than even PS3 has almost free AA with its RSX ( almost 20-30 % real hit in performance when using 4x AA at 1080i resolutions in today actual games, ( officially its even less than 10 % ! like the official Microsoft numbers for the hit of performance on xenos when using AA ! Ironicâ€¦) and this wont change a lot in future games, because they will be more shader intensive rather than polygons intensive or even texture intensive )
So please donâ€™t be fooled by the advantage of doing free AA, due to the using of an eDRAM of 10 MB. Because as you have seen this is not true. And there is no free AA at HDTV resolutions on xbox360.
So please stop believing anything, before you are sure of it.

Now lets look at the second main advantage of having eDRAM on xbox 360 : Economize bandwidth, by using the bandwidth of edram to do AA, and Z-buffer, and stencil. so the bandwidth of the GDDR3 ram will be used for shaders, and textures . But even this relative advantage against PS3, was simply killed due to memory bandwidth sharing between XENOS and XeCPU ( 22.6 GB/S ) so if we assume the CPU will need 50% of bandwidth 11.3 GB ( when running complicated physics, AI and animation engines ) than only 11.3 GB/s will be available for the GPU which is not sufficient, and will limit the power of the xbox 360 GPU.

NOTE : With the SATURN-3DO-PS1-N64 generation ( saying itâ€™s the 32 bit generation its not accurate as you know), the trend (or the concentration of developers) was to create more polygons. With the DC-PS2-XBOX-GC generation the concentration of developers was to create better textures ( Sony failed to anticipate this fact, thus leading them to not include a S3TC texture compression technology, and to choose bandwidth over quantity of memory, which was a wrong choice for the GPU, but a great choice for the CPU ) but with next generation games the concentration will be to create better more complex shaders. So this will minimize the requirement for more bandwidth and more memory quantity ( I am not saying bandwidth and memory quantity arenâ€™t a bottleneck on PS3, but I am saying that this Is less of a problem than it was in the previous generation ). So in the next generation we will return to the case of first 3D generation consoles : The CPU will be more important ( complex animations, physics, AI and simulations ), and the limits will be more of a processing power ( to do longer complicated shaders ) than bandwidth or memory quantity. ( thanks GOD for this because as you know its easier for manufacturers to improve processing power than to improve bandwidth )

Disadvantages of 10 MB of eDRAM :

We have just seen that the advantages of having eDRAM are just killed. Now lets look to the disadvantages : there is 332 million transistors on XENOS, but only 252 million transistors are used for LOGIC and calculations ( processing ) ( 232 million transistors for the mother die ( pipelines ) and 20 million transistors used on the back buffer, daughter die, for AA, z buffer, and stencil )
So if there was no eDRAM, ATI could use all 332 million transistors for logic and processing power ! ( like the 300( or a little more ) million transistors for the RSX ) this does mean that the RSX has 300 million transistors dedicated to processing, and xenos only 252 million transistors )

Now lets look more closely to RSX, sure we donâ€™t know detailed informations which GPU the RSX will be, but we know SUFFICIENT enough informations, at least to make a comparison with XENOS.

We know 3 things about RSX :
1/ Its based on the G70 of NVIDIA, so we wont see radical differences in design, like big difference in the number of pixel or vertex shaders, or how complicated they are.
2/ it will be build on a 90 nm process, running at 550 MHZ ( or it wont differe a lot from this ).
3/ it has been designed to work with CELL, so the RSX use almost the same FLOPS programs language as CELL ( no conversion needed to communicate for efficiency, so each one understand the other ), and the interface to communicate with CELL is flexIO at 35 GB/S.

And for the rumors that the G70 has more than 32 pipelines, and they were disabled by NVIDIA, this is NOT TRUE ! Its impossible, technically or logically.
( I could elaborate on this on this forum, or maybe in part2 of my article)
but now lets concentrate on RSX, it wont be a 32 pipeline GPU, but only 30 or even 28. ( 2 or 4 will be disabled for redundancy a la one SPE of CELL ) and this for sure.

So lets assume there is 30 pipelines on RSX ( 22 pixel pipelines and 8 vertex pipelines )
The 8 vertex pipelines of RSX are as capable as the pipelines of XENOS. But the 24 pixel pipelines are a lot more powerful, also the RSX is 550 MHZ unlike the xenos : 500 MHZ.

So in terms of raw power the RSX is much more powerful ( 1.5 times more powerful ) than xenos.

But Microsoft and ATI claims that XENOS is much more efficient than RSX and that this efficiency make it more powerful ! lets analyse this :
There is a uinified shader architecture on xenos ( each pipeline could do a vertex or a pixel shader )
The benefits are clear : this makes the GPU more flexible and more efficient. WHY ? simply because suppose on RSX you need less than 8 vertex shaders, ( say for example a game needs only 6 vertex shaders ) than you have 2 wasted vertex shaders! Now suppose you need more than 8 vertex shaders, say for example 10, but you have only 8 and you cant use the pixel pipelines in doing vetex shaders. But if you have a unified architecture than you use the GPU more efficiently in distributing workload among pipelines as needed by the scenes of the game.

Its clear that a unified architecture is the future of GPUs. But saying rise some important questions about the real efficiency of the Xenos, because as you know xenos is the first unified shader architecture not only from ATI but in the whole world ! so there is no experience in doing a unified architecture therefore we donâ€™t know if Xenos will really be as efficient as claimed or not ! for example having a unified pipeline rise some problems that didnâ€™t exist on a separated pipeline architecture like : the results from the vertex processing must have sufficient memory on GPU to be placed until it will be used by a free pipeline, than those results must be sent rapidly and efficiently for pixel processing, and since each pipeline could execute only one vertex shader or one pixel shader at the same time, than this is really a big problem because if a result from a vertex shader has no enough memory to be placed, or its not sent rapidly to the adequate free pipeline, than you have just a system that is blocked ! Of course ATI thought very well of those problems, and implemented solutions to them, but how efficient those solutions to those problems are especially for a first attempt at doing unified archictecture well determine how efficient is the new architecture.
Lets assume this new architecture is 100% efficient and no problem mentioned before occure when processing data. Than
we could understand why ATI and MS claim that the unified architecture is more efficient than a separate architecture, but does this make the XENOS more powerful than RSX ? NO

For many reasons :

1/ the raw power of the RSX is 1.5 more than xenos and even if we accept that xenos is more efficient, this only take full effect on some minor special cases, and even if we consider only those cases, the more efficiency just wont make the xenos as powerful as RSX.

2/ video games need a lot more pixel shaders than vertex shaders ( this is obvious because on screen there is a lot more pixels than vertices, also higher rez and quality textures and particles, make far more visual impact than just more polygons or geometry, so developers since the 128 bit generation concentrate less and less on adding polygons on their games ) so this just minimize the gain in efficiency when having unified pipelines.

So to make xbox360 as efficient as possible against PS3 there is a lot of conditions :

1/ the game must need more than 8 vertex shaders ( very complex geometry ) and minimize the need of pixel shaders. ( as I said the RSX is better than xenos in running pixel shaders intensive engines, with little vertex shaders )

2/ the game run at 720P with 2x anti aliasing. ( so no tiling needed, and we have a real total free AA as claimed by MS)

3/ there is no complicated physics, collision detection, partical effects physics, real time calculated animations, complicated simulations, and complicated AI. So the CPU wont be a limit factor for the GPU and wont need a lot of bandwidth, so the free bandwidth could be used by the GPU xenos.

Now lets look at what future games will demand :

1/ complicated physics, AI, animations and simulations.

2/ a lot more complicated pixel shaders.

3/ 1080i and 1080p resolutions with 4x anti aliasing, 8x anisotropic filtering, HOS ( high order surfaces ), 64 bit or even 128 bit HDR( crazy kazunori yamuchi and his team ! we know him ! he implemented 1080i for GT4 on PS2, and he dropped the online at the last minute after years of working on it and this was for me a big shock ! ) , high rez normal mapping, soft self shadowing, advanced motion blur effectsâ€¦etc.
( the only future effect that is suitable for xenos is displacement mapping, and unfortunately it seems future games wont use a lot of displacement mapping for many reasonsâ€¦not our subject here )

We clearly see that those conditions decrease the gain of efficiency of xenos against RSX and given 1.5 times more raw power of rsx against xenos, this make the ps3 more capable graphically than xbox360, and clever developers on PS3 could create things just impossible to run on xbox360.

BUTâ€¦â€¦â€¦. The big differences that you will see between xbox360 games and some PS3 games arenâ€™t in graphicsâ€¦but in other departments of next generation video gamesâ€¦but this is on my PART 2 of the article. ( didnâ€™t finished it , I will also talk on part 2 about disadvantages of ps3â€¦problems with sony and Microsoft strategiesâ€¦video games industryâ€¦etc )

Excuse me for the bad english, and for being too long, and feel free to correct me, criticize, or just debateâ€¦

Johnny Awesome · Jul 11, 2005

Quaz51 · Jul 11, 2005

ralexand · Jul 11, 2005

What the hell is this?

Is that you, KK?

overclocked · Jul 11, 2005

The PLAYSTATIOn3 and the game industry !!! Part1...

fouad

Johnny Awesome

Quaz51

ralexand

overclocked

Similar threads