Look at this Google-cached (pulled down) PlayStation 3 page

Sony believes it could "teach" average developers to write efficient applications for PSX3, much the way average developers were taught to deal with polygon rendering and PSX2 in the 90's.

do I detect a note of sarcasm here? :p
 
Deadmeat said:
And you have to resort to custom coding if you want your games to be graphically outstanding, which is expensive.
Technologically outstanding games are always difficult (and expensive) to create, regardless of the platform. It could also be argued that easier platform make them even more difficult to make (harder to stand out) but since we'd mostly be going on circumstantial evidence for both sides of argument it's ultimately pointless to go into it.
Graphically outstanding games are not necesserily always technologically outstanding though (and vice versa).

Moreover I like to believe that licensing tech doesn't mean people will eventually be doing "plug and play" with entire game engines devoid of scarcely any modifications save for slightly modified art. (granted, we've seen things nearly as bad happen with PC FPS games at "one" point, but thankfully that trend didn't last long).

Show me how you arrived at your calculation.
Same way as you did. 1Teraflops / (Max. number of operands possible per frame (512MB/4)* 60).

"Once again, show me how you got that ratio."
See above but operands per frame is 512MB/16 (accessing 25% of memory).

Looking forward to Faf Racer2 on PSX3 to prove your word; BTW, what was the title of Faf Racer1 again??? I have not followed the status of your racer for a while.
We're going through final Q&A right now. Anyway, I'm not yet sure if I want to work on many more racing games after this.

Like everything is in voxels or bezier curve in PSX3 generation and need shitload of calculations to render and lit???
Shitload of calculations sounds about right to me, especially compared to current gen which is still dominated by simpler forms of lighting along with using as much prelit stuff as possible.
 
...and i thought that to make a game look good u need a good artist and a decent enough technology....
a game will look good as long as the artist is good.
without good artwork, u can push as many polygons and textures as u want, the game is still gonna look like crap.....

it's like saying, if u use a hi-def camera, then u'll win an oscar for your movie in whatever category it is that takes care of how good a movie looks. (got it on the tip of my tongue, just cant get it out of my brain...).
if u use Terminator1 scenery in your movie, it will look like crap even though u use the latest Hi-Def camera for your movie.

still, if you have artist that can come up with scenery that look as good as LOTR, then having the latest and best technology doesnt really matter, it will still look much better than the movie shot with a Hi-Def camera...

not sure the example is the clearest but hey at least i tried... sounded much less complicated in my head... :LOL:
 
ARM is everywhere because it is everything CELL isn't.

1. Simple.
2. Inexpensive.
3. Easy to develop software for.
4. Low power consumption.
5. A dozen 2nd sources competing on price.
6. Excellent tools.

Dunno about simple... ARM10 and 11 cores have fattened up quite a bit in the past couple of years to the point that they're not so simple anymore (certainly not like it was back in the ARM610 and earlier days)

Costs depend on the core you're looking at. Obviously there's many factors like whether or not you're using a standard core, an obscure one by a licensee, or synthesizing you're own.

Ease of development isn't really that much of an issue... The maturity of the ISA will obviously influence whether or not how mature it's software tools are (which will in turn determine the relative quality of development on that platform).

As far as powerconsumption goes, it's not the marvel it once was. There's many processor cores out there that are non-ARM that are easily out-performing it.

2nd sources just means there's a lot of licensees. That can easily apply to any ISA that may appeal to device developers. MIPS also flourishes on the same model as ARM.

"Excellent tools" is a redundant argument of #3...

Still, this does not resolve the fundamental issue of PSX3 development complexity.

Well considering the "PSX" hasn't been released yet I don't think we need to worry too much about the "PSX3" quite yet... :p

This is what they do at NASA and NSA to enhance satellite pics.

Whoa! My digital point & shoot has the powar of NASA and the NSA in palm of my hands then!!! :oops:

Cramming more processors into single die is not the solution to performance problem. DirectX works because it makes parallel shaders largely invisible. Maybe MS will be getting its big break with Xbox2 afterall.

I thought that this was funny since you don't seem to realize the contradiction here... :p

It could also be argued that easier platform make them even more difficult to make (harder to stand out) but since we'd mostly be going on circumstantial evidence for both sides of argument it's ultimately pointless to go into it.

Likewise, "easier platforms" can sometimes make solutions to difficult problems hard to solve since within the "easy" framework a lot of decisions are made for you and thus your options can be more limited, and stepping outside that framework can avail one to some rather painful processes in order to find a solution to the problem... (or just make some undesirable compromises)
 
...

To Faf

It could also be argued that easier platform make them even more difficult to make (harder to stand out)
Developers could then compete on the merits of game play and artwork on even ground, which is actually a desirable thing,

Graphically outstanding games are not necesserily always technologically outstanding though (and vice versa).
Exactly, but smaller developers with little budget might not even get the chance to show their artistic skills because of their coding problems.

granted, we've seen things nearly as bad happen with PC FPS games at "one" point, but thankfully that trend didn't last long.
Thinks look that way on PSX2; games pretty much look alike from one another because smaller developers are now forced to scrap their own engine development and license "standard' engines instead.

1Teraflops / (Max. number of operands possible per frame (512MB/4)* 60).
A meaningless figure since PSX3 cannot sustain 1 TFLOPS, even 200 GFLOPS would be difficult.

A typical C/C++ generated code consists of 40% MOV instructions. I am not sure what percentage of that MOV instructions are Load/Store type memory instructions, but lets say the half for the sake of arguement, so the average ratio of all operations to memory access instruction is 5:1, which sounds about right.

Assume that CELL cache a 95% hit rate(I am being very generous here), incurring one offchip memory access per every twenty memory access instructions in the codestream.

5:1 X 20:1 = 100:1 maximum.

In a bandwidth restricted architecture like CELL, bandwidth determines the sustainable FLOPS. Going by the real world bandwidth estimation of Yellowstone at 12 GB/s(3 billion operands), you get a maximum attainable FLOP figure of 600 GFLOPS.(If the cache hit rate is 90%, then 300 GFLOPS max)

Shitload of calculations sounds about right to me, especially compared to current gen which is still dominated by simpler forms of lighting along with using as much prelit stuff as possible.
It is not the right time to move away from polygons as you could still use more polygons to improve on character/model details.
 
I have my pitties for poor PSX3 developers.
Wow! You must be one heck of an empathic soul to be upset for them this much :LOL:

Thinks look that way on PSX2; games pretty much look alike from one another
I probably have around 30 PS2 games, and except for some sequels (like Onimusha 1 and 2) I can't think of two games that look even remotely alike. PS2 is like a pool of different graphics engines. Even EA's games like NBA:Street 1 and NBA:Street 2 look absolutely nothing alike.
 
A meaningless figure since PSX3 cannot sustain 1 TFLOPS, even 200 GFLOPS would be difficult.

You seem a bit too sure of yourself Deadmeat, as usual...

I think their engineers thought about susatining more than 1/5th of your potential FLOPS rating and I think they know they can aim for a much better goal.

A typical C/C++ generated code consists of 40% MOV instructions. I am not sure what percentage of that MOV instructions are Load/Store type memory instructions, but lets say the half for the sake of arguement, so the average ratio of all operations to memory access instruction is 5:1, which sounds about right.

Hold on right there...

A typical C/C++ generated code ? By what compiler ? For what target architecture ? For what kind of algorithm ?

You are over-generalizing...

You know that the figures varie quite a bit when you go from 8 architectural registers to 4,096 architectural registers ( we could debate about each APU sustaining 32 GFLOPS or we could think, as you were doing, at the CPU sustainign the comined 1 TFLOPS, but then I can also say that the compiler has 4,096 total registers to work with... ).

Even going to the APU level, when you have 128 GPRs, it is easier to optimize the number of LOAD/STORE instructions than what it would be possible on x86 and its 8 GPRs... and an Intel CPU has only those GPRs ( plus the XMM and MMX, but again, the Cell CPU doesn't have 128 GPRs shard across all the APUs ).

Also, the Cell programming paradigm does not seem to stick in your mind...

The e-DRAM is not main RAM ( by what we define main RAM as where the code is that we execute on our processor [the caches only "cache" it locally, but the real deal must be in main [System] RAM ), the external RAM is NOT main RAM...

The APUs Local Storage ( LS ) IS main RAM...

Yes, things are going to be subdivided ( from their bigger form ) in huge sets of 128 KB sized elements you need to process ( these are the chunks of data the APU process ), so if you wanted to know how many MOVs from the APU hit the Yellowstone external RAM I will say to you "0, none of them will"... not even the e-DRAM is directly addressable by the APUs, you DMA from and to e-DRAM and the same should be true for the external RAM which would be ellowstone based.

As long as the Local Storages are not empty, lacking data for the APUs to process the APUs will keep crunching data.

Also, you seem to forget that I really, really doubt they are going to feed all these APUs directly with a 25.6 GB/s bus to external memory.

The patent, and you dodge this point better than Neo dodges bullets in the Matrix movies, clearly states a combined 1,024 bits pipe to main memory that uses several Memory Banks controllers and several DMACs ( one for each PE ) in a cross-bar switch memory set-up.

Do you imagine an EXTERNAL/PCB 1,024 bits wide memory bus running >500-600 MHz ?

I do not... I do not even see such a bus working at 300 MHz ( DDR would mean 600 MHz signaling ).

Woah... and with that frequency you might need differential signalling to keeps data from corruption due to noise... that would mean... uhm... for data lines alone... only 2,048 pins/lines... designing the PCB would be incredibly tough and expensive.


4 MB of Insanely FAST SRAM based Local Storages feeding the APUs, 16-32 MB of Really FAST e-DRAM feeding the Local Storages ( e-DRAM would not be clocked at the same speed the APUs are ) and then 256-512 MB ( with 256 MB being the most probable choice even if I would not mind 512 MB if this would not mean cutting off other key areas of the architecture like Blku-Ray functionality, Cell chips e-DRAM, etc... ) of 25.6 GB/s of Yellowstone DRAM feeding the e-DRAM.

Assume that CELL cache a 95% hit rate(I am being very generous here), incurring one offchip memory access per every twenty memory access instructions in the codestream.

Cell is virtually cacheless, all micro-memories seems to be exposed ( might change or not, the Local Storages might also have portions operate as caches [re-configurable by the programmer... ], but I doubt it ) and have to be managed...

This is where a Sony provided versatile memory manager would help :)



5:1 X 20:1 = 100:1 maximum.

In a bandwidth restricted architecture like CELL, bandwidth determines the sustainable FLOPS. Going by the real world bandwidth estimation of Yellowstone at 12 GB/s(3 billion operands), you get a maximum attainable FLOP figure of 600 GFLOPS.(If the cache hit rate is 90%, then 300 GFLOPS max)

Why you put the bandwidth of Yellowstone as 50% of its theoretical peak ?

Even Direct RDRAM did better than that and Yellowstone fixes basically all of Direct RDRAM latency issues and then some: separate address bus, all data busses are bi-directional and thus we are not limited by the number and mix of WRITE or READ commands we issue to the Yellowstone DRAM adn the local clock of the DRAM chips is 1.6 GHz ( 3.2 GHz for data transfers as we operate in DDR mode ) so latency in getting the data off the memory and ready on the data bus should be lower than Direct RDRAM's 400 MHz speed.

12 GB is less than 50% as the maximum is 25.6 GB/s.. you are predicting 46.8% efficiency...
 
I probably have around 30 PS2 games, and except for some sequels (like Onimusha 1 and 2) I can't think of two games that look even remotely alike. PS2 is like a pool of different graphics engines. Even EA's games like NBA:Street 1 and NBA:Street 2 look absolutely nothing alike.

Marconelly, consider yourself BANDED, yeah I mean it... BANDED...

How can't you honestly agree that Sonic Heroes's footage and GTA3 look identical... ? It is SO evident they are using the same identical engine ( Criterion's Renderware ) :rolleyes:
 
Re: ...

DeadmeatGA said:
A meaningless figure since PSX3 cannot sustain 1 TFLOPS, even 200 GFLOPS would be difficult.

No. You know what is difficult?

Speaking so confidently about a design YOU KNOW NOTHING DEFINITE ABOUT. Now that is difficult.

I'd like to see you admit you have no real idea what PS3 will look like, how it will perform, what the difficulties and pitfalls of the architecture will be, etc. Heck, I'd like to know if you're even fluent in any programming language, and if so to what extent, or even if you work in the industry or not (of course, you're free to bullshit me since this is the internet, but rest assured faf and people like him are likely to see through your act QUITE easily, so I recommend you refrain from any embellishments of your actual prowess in this regard).

the average ratio of all operations to memory access instruction is 5:1, which sounds about right.

...Except, you're just pulling figures out of thin air/your nether regions here of course. Maybe your unspecified C-code, on an unspecified processor architecture has that ratio. You can't know if physics/3D vector C-code compiled for Cell will look the same.

x86 has an unusually high amount of mov-type instructions since it only has 8 integer registers, and not all of those can be used for all instructions either as far I know (unless those restrictions were removed as x86 continued to slowly evolve). Cell sub-processors have 128 128-bit registers PER PROCESSOR. Quite a difference, wouldn't you say?

Your 5:1 ratio there is by no means a universal constant. I'd like to see you admit to that also.

Going by the real world bandwidth estimation of Yellowstone at 12 GB/s(3 billion operands)

I've seen you toss around 12GB/s before. Where did you get this number from? Oh, you took a 64-bit yellowstone setup and divided peak b/w by half and ASSUMED that is an accurate representation of probable real-world performance. Well guess again, mister. ;) You're again pulling numbers out of thin air/you-know-what.

What if average yellowstone efficiency is higher than 50%? What if PS3 features 128-bit memory width?

The rest of your calculations are as bogus as the data you base them on, no need for further comment. Please try again.

Or rather, please don't, as I'm sick and tired of your pessimistic rants.


*G*
 
Yes, Grall... I initially wrote 32 GPRs per APU... it is 128 GPRs per APU (brain fart of mine produced 32 GPRs per APU )...

This would make for 4,096x128 bits registers or a total storage of 64 KB in registers ALONE...


Even counting 128 bits as a containign a single non packed data you would still have a total of 4K of them in form of registers and that is like half the L1 Data Cache of the current Northwood Pentium 4... in registers alone...
 
The compiler will have fun with 4,096 registers :)

Man, the chip will be massive :)

Fortunately, once you get each APU working in silicon and you optimize its size well debugging the rest of the logic should not be ultra hard as we have a huge repetition of the same building block for the most part...

Once you build the first PE with e-DRAM and you get it running in silicon, you are not far off from a bigger chip adding other PEs on the same die...

It will push the 65 nm process hard, but it should not kill the logic/circuit debuggers in fighting ultra tiny bugs all over the place...
 
Developers could then compete on the merits of game play and artwork on even ground, which is actually a desirable thing,

They're doing that already...

Exactly, but smaller developers with little budget might not even get the chance to show their artistic skills because of their coding problems.

"Coding skills" aren't going to be the problem with regards to budget. If it is then it just means you're a choad for getting suckered into hiring a chump programmer. Also budget will not hinder your artistic qualities unless again you are a choad and have put yourself in the situation where you're trying to do too much with too little personel, or you've grossly over-estimated your abilities and/or under-estimated the requirements necessary to finish the title within a reasonable quality of finish.

A typical C/C++ generated code consists of 40% MOV instructions. I am not sure what percentage of that MOV instructions are Load/Store type memory instructions, but lets say the half for the sake of arguement, so the average ratio of all operations to memory access instruction is 5:1, which sounds about right.

Typical? Does that also apply to ISAs like PowerPC which have no MOV instructions....
 
Developers could then compete on the merits of game play and artwork on even ground, which is actually a desirable thing,
For one, that's what's happening most of the time already. And two, there are also negative sides to it - because you will always have your Squares releasing patched up PSOne engines but loaded with 50mil$+ content creation budget which easily overshadows any tech smaller devs might come up with without even trying. (ducks shots from Archie's BFG).
I kid I kid :LOL:
But point being - don't go telling me art is competing on even ground for smaller devs, we're not in 1980's anymore.

Thinks look that way on PSX2; games pretty much look alike from one another because smaller developers are now forced to scrap their own engine development and license "standard' engines instead.
I thought PS2 was the polar opposite of that - most games look pretty jarringly different from one another. At times even from same developer.

A meaningless figure since PSX3 cannot sustain 1 TFLOPS, even 200 GFLOPS would be difficult.
Hey don't look at me, I used YOUR numbers.

It is not the right time to move away from polygons as you could still use more polygons to improve on character/model details.
I didn't suggest moving away from polys. But as you pointed out yourself we'll be dealing with some compressed form or another (lately I've been partial to displacement mapping although I may change my mind tommorow) as well as heavy shader overhead.
 
I think he said that in regard to the animation system...

The rendering tech could not be... the polygon count and particle effects are at a level PSOne cannot handle nor dream of...
 
One thing I"m curious about is why so many people are harping on C/C++ as languages that game developers would use. Personally, I think that'd be shooting yourself in the foot, seeing as those languages are quite serial and doing multi-threading is worse than pulling teeth via your back side. In anycase, I'm wondering if languages such as Erlang or something fashioned in a simillar manner, which SEEM to support concurrency in a better fashion will become fashionable if not the main stay. I'm not sure why people still are huge fans of 30ish year old languages -- C/C++/Java... they're nice and all but really, can't we get significantly better?

Kinda sad how the hardware moves and the software side of things really only seems to ride on that.
 
Panajev2001a said:
The compiler will have fun with 4,096 registers :)

Man, the chip will be massive :)

Fortunately, once you get each APU working in silicon and you optimize its size well debugging the rest of the logic should not be ultra hard as we have a huge repetition of the same building block for the most part...

Once you build the first PE with e-DRAM and you get it running in silicon, you are not far off from a bigger chip adding other PEs on the same die...

It will push the 65 nm process hard, but it should not kill the logic/circuit debuggers in fighting ultra tiny bugs all over the place...

The size of Cell is my most major concern about PS3. Frankly, 32MB of eDRAM and 4MB of eSRAM with all those registers should run over 200mm^2 in die size; an enormous amount of die space without any logic circuits at all. 4 PE, or whatever the CPU is called, should run about (my guessimate) 25mm^2 each, for 100mm^2 total. No clue about the APUs but clearly they would take a lot as well. Simply put I don't see this actually making this at 65nm without some sort of diespace saving feature or cuts. I've quiet a few articles about Toshiba making very high density eDRAM, that's probably going into the PS3. They may use smaller CPUs than I've mentioned, it doesn't need to be too powerful anyways. Still, I don't see this thing coming in at under 300mm^2 total.

It's doable, maybe, but I doubt we'll see a PS3 by 2005 unless they make major cuts. Like you said Panajev, this will push the 65nm process hard. If the Prescott rumor is true (it runs at 100W initially), then it means that Intel is having a terrible time dealing with design problems (my guess is mainly with gate leakage) at 90nm, and that will be even worst at 65nm. I don't see such a massive chip like Cell the moment that a process is ready. For 65nm, that's 2005, so the PS3 looks to me like a 2006 system.
 
I do not know why the APUs and the PUs should be huge...

I mean, you are clacluating aisde from APU's area the SRAM ( Local Storage ) which is part of the APU themselves... and you are taking the registers out too...

If you take out the Local Storage the APU should not be incredibly big...

I think they can fit it with quite a bit less than 300 mm^2, I am not calling less than 200 mm^2, but I cannot go as high as you...


Yes it will push the 65 nm process hard: that might be a reason why they already started building the fabs now and why by middle of next year they can start mass manufacturing 65 nm chips...
 
I don't see the Cell much less than 300mm^2, maybe 250 - 300, since they will have some way to reduce the size. My guesses were roughly based what happens if you simply slap the basic parts on a chip, which they aren't doing.

Actually, my biggest concern is not the size of the chip (shouldn't be a major problem unless its size>>300mm), but the heat such a big chip will produce. Gate leakage is going to be huge for 90nm and even more so at 65nm. All the recent info on the Prescott and later the Tejas show that they are heat monsters, and these chips are only around 120-130mm^2. For something like the Cell, even if it's close to 200mm^2, the heat will much greater relative to clock speed. Frankly I don't see the Cell going anywhere near 4Ghz as some may have predicted. Although such a high clock speed may be possible, the heat will be simply overwhelming AFAIK.

Bold prediction 8):oops:: I predict that the Cell will clock very much like a GPU, and hit around 1 - 1.5 Ghz.
 
Back
Top