Kutaragi Ken:Want a teraflop? You have to buy a rack from us

chaphack · Nov 5, 2003

Phil said:
You'd actually learn a thing or two if you actually read and comprehended what people try to tell you, instead of arguing like a primitive 10 year old trying to put down praise of anything Sony related.

I do. It just depends on who the authors are.

Some people are just as clueless as i am while the good ones, i love reading their post.

And no, im not putting anything Sony down. Something that deserves to have doubts, be darn sure to get it from chappers. On one hand you have the glowing Sony Optimisms while on other the down to earth Sony Disbelievers. SO vs SD. Cancels each other out. Brings balance to the place!

Paul · Nov 5, 2003

firstly, how well qualified are you with 3D hardware systems. be honest, do you really understand the full going-ons, or just playing the basic cut + paste job. it might sound rude(sorry), but being a n00b that im, i dont want be wasting time and not learning anything.

Several degree's in college. Before finally deciding that I wanted a masters in Networking(I did get it).

yes. but how do you know they are going to be the same? even today, we have all those quirky MSAA/FSAA/Quicunx/OGASS/something whacawhaca. Even then, AA performs better on Ati R300 cards vs the same gen rivals NV30..

They will be relatively the same next gen quality wise, more akin to that of GC and Xbox now.

PS3 VS Xbox2 VS N5 IQ wise.... there will be no PS2 to Xbox differences. Sony is smarter this time, the differences will be small not unlike GC to Xbox. Your also forgetting, we aren't talking about PC's here Chap. So any RXXX to NVXX arguments are invalid basically. Consoles are dedicated hardware.

Will Xbox2 have "better" IQ. Sure, then again Xbox also does have better hardware for IQ comparing it against the Gamecube. But when it comes to Xbox VS GC IQ.. how big of a difference is there? Not much, if at all for most games.

Because it might not be powerful enough?

You wouldn't see any GPU maker have the thing have 10GFLOPS of Peak floating point performance and ditch the hardware Shader support.

When you have massive Floating point performance(I'm not talking NVflops) on a GPU you can crunch Shaders very well. Take a look at Figure 6 in the Cell patent, There is a reason why the VS's have all that floating point power. I'll give you a hint, it's not for marketing.

chaphack · Nov 5, 2003

Several degree's in college. Before finally deciding that I wanted a masters in Networking(I did get it).

As? No offense, but your posts come across more like a journey man rather than an expertise...

They will be relatively the same next gen quality wise, more akin to that of GC and Xbox now.

PS3 VS Xbox2 VS N5 IQ wise.... there will be no PS2 to Xbox differences. Sony is smarter this time, the differences will be small not unlike GC to Xbox

So it is Sony Optimism...

Your also forgetting, we aren't talking about PC's here Chap. So any RXXX to NVXX arguments are invalid basically. Consoles are dedicated hardware.

why does a dedicated hardware make AA any better/faster... you still have to through the bandwith/ram/mipmaps/whatnot .... :?:

When you have massive Floating point performance(I'm not talking NVflops) on a GPU you can crunch Shaders very well. Take a look at Figure 6 in the Cell patent, There is a reason why the VS's have all that floating point power. I'll give you a hint, it's not for marketing.

So how you propose programming and optimising a shader algorithm for such a setup compared to a hardwired one? When you talk about flexibility, just whats the gain? What type of stuffs are you going to show? What will be the visual difference? Go on, tell us more. Thanks.

Paul · Nov 5, 2003

As? No offense, but your posts come across more like a journey man rather than an expertise...

Why waste your time writing up huge technical posts when there are those that will do it for you. Someone else here I know realizes this, but I'm not going on.

So it is Sony Optimism...

Common sense says there will be no IQ problems next gen. Sony has learned it's lesson.

why does a dedicated hardware make AA any better/faster... you still have to through the bandwith/ram/mipmaps/whatnot ....

It's different. When you turn on AA in a PC game it's brute force, the GPU will force a certain degree of AA(be it 4XX or whatever) over the entire screen(FSAA).

With a console you have a dedicated system, the developer is working specifically for said specifications. There are certain tricks and cleverwork a developer can employ.

Again I'm trying to word this so that you'll understand..

So how you propose programming and optimising a shader algorithm for such a setup compared to a hardwired one? When you talk about flexibility, just whats the gain? What type of stuffs are you going to show? What will be the visual difference? Go on, tell us more. Thanks.

Again.. the advantage is clear. While your competitor is wasting time locking features in hardware be it let's say Pixel shader 6(made up number) or something, you will have massive floating point power on your GPU so that developers can program any type of shader they want. Shaders that would probably given the power be better than the competitors hardware locked functions.

Visual difference? Depends on many things.

chaphack · Nov 5, 2003

Why waste your time writing up huge technical posts when there are those that will do it for you. Someone else here I know realizes this, but I'm not going on

ehh-huh..hmm.

Common sense says there will be no IQ problems next gen. Sony has learned it's lesson

Common senses dictates that Sony shouldnt even have IQ problems this gen

It's different. When you turn on AA in a PC game it's brute force, the GPU will force a certain degree of AA(be it 4XX or whatever) over the entire screen(FSAA).

Why is it any different from turning on FSAA for consoles?

With a console you have a dedicated system, the developer is working specifically for said specifications. There are certain tricks and cleverwork a developer can employ.

I think everyone knows about the advantage of dedicated systems. Then again, what makes you think the final performance/result will be the same?

Again I'm trying to word this so that you'll understand..

Oookay.

Again.. the advantage is clear. While your competitor is wasting time locking features in hardware be it let's say Pixel shader 6(made up number) or something, you will have massive floating point power on your GPU so that developers can program any type of shader they want. Shaders that would probably given the power be better than the competitors hardware locked functions.

Visual difference? Depends on many things.

Hmmm...

Have a nice day Paul. Going off now. Nice talking to you.

chaphack · Nov 5, 2003

Last thing, hope Paul is not busy going through Googling or PMing.

Byeeee.

Paul · Nov 5, 2003

So you mock my position basically. Why did I even give you a chance, if your not going to want to hear what is said I'm not even going to bother anymore.

But I must clarify, this;

With a console you have a dedicated system, the developer is working specifically for said specifications. There are certain tricks and cleverwork a developer can employ.

was explaining why PC AA is different.

Vince · Nov 5, 2003

AzBat said:
I understand that Sony Cell and IBM Cell are not the same, but you still haven't gave me any comparisons on how they different. Just remember I didn't understand any of the technical mumbo-jumbo you just posted. I just want to be able to figure out what FLOPS, cores and APUs the Sony version will have and how it relates to these slides. Is it as DM says and that Sony won't release a 2 Tera FLOP capable part for PS3 or is it that Sony's PS3 is more capable than even IBM's Blue Gene? I'm totally confused.

Hey Tom,

The concept of a Cellular Architecture as I've come to know it is pretty simple to understand and is the embodiment of three major architectural ideologies to achieve high performance in the areas where typical IC's get trapped by the Von Neumann bottleneck*. These three are:

Local
Parallel
Simple

The "pure" concept is this: You say F-you to the high superscalar with hundreds of instructions, long pipelines, a large area % utilized by prediction routines and such - and instead you choose a paradigm that takes a simple pipeline with a reductionist instruction set and significant embedded RAM and create a fundamental construct, a Cell. This is the Simple part.

Next, you hit the copy and paste button a given number of times that should be bounded by 10^N. This is the vastly Parallel part.

Finally, you interconnect said Cells based on locality, thus bounding them to direct communications with a given number of Cell within three spatial dimensions. By doing do, you eliminate a central control point from the architecture and it's basically a vast finite state machine. This is the Local part.

The ideology behind this "pure" Cellular ideal is pretty sound for general supercomputing needs, I like it due to it being a pseduo-analogue to a closed neural system - which, well, nevermind. It's also been shown in studies, as well as comments by people here, that x86 is a tremendous waste of logic and area when all is said and done - Cellular computing got around this threw the simple part where it traded like ~20% of absolute performance for a gain of ~50% die space back.

---

Now, STI's Cell vis-Ã -vis the Suzuoki Patent is a different beast all together. It's definitly been designed for a post-modern broadband environment where they've done away with the Locality step all together if Suzuoki's patent is true (which I tend to believe as he patented the EE in 1997). That's the first thing that's apparent and it's what will open the door for any computation or data sharing with other STI Cell devices.

Then, when you look at the actual microarchitecture, they've [STI] basically done away with the whole Simple step when it comes to a global view. They've obviously designed it with 3D, AI or World Sim type applications in mind as they've loaded the bulk of the processor's area up with dedicated logic blocks [eg. APUs] which have most likely been area optimized - perhaps utilizing this..

Thus, we can expect the "pure" Cell ideology of area efficiency to carry over in the individual constructs - as seen in the APUs, or the PE cores which I'd hazard a guess are stripped down PPC or MIPS cores. This is a relationship that would seem to be born out of necessity - as you can see by the Blue Gene chart DM is championing, you won't get a plurality of general purpose MIPS or PPC pipelines to reach 1TFLOP; you need to revert to dense/coarse-grained constructs of dedicated logic - ergo the APU. So, there is this balance between striving for a single, "pure" cellular construct and getting the computational power you desire out of a given area, A (perhaps, 250<A<320) which is capable of manufacturability.

Also, the eDRAM would seem to remain as it's an integral part of the entire ideology. But, I don't need to get into this with people like Panajev, so that's enough for now.

And the, finally, is the Parallel stepping. Which STI Cell isn't really 1-to-1 comparable. Consider this, while each IC has a coarser granularity and thus won't be bound by the 10^N bound per IC/per system - it has broken free of the locality requirement. Thus, it has a theoretical upperbound of the aggregate of every STI Cell device in existence. In Praxis, this will be much more limited, but for IBM servers or Computing-as-Utility or within a Sony/Toshiba living room, this could potentially kick in nicely.

Thus, in conclusion I think it's become apparent that while STI Cell shares many fundamentals of the Cellular Architectural ideal, it's a set-piece, custom IC that deviates from it as several major junctions. In fact, personally, I would question if it's a Cellular Architecture when viewed as a singular IC - perhaps Mfa's aptly termed, Memory Anemic Supercomputing array (or something to that affect) is more correct. But, what would I know... I have little formal training in architectures, and WTF do us in the Neurosciences know?!?

I'm probably forgetting stuff, I always do. Yet, I need to get going, so if you see a problem PM me or correct it outright. Unless your name is Chap or DMGA - in which case I really don't want to hear it.

* This being the [increasing] differential between RAM access times which are high multiples of clock cycles and lead to stalls and such. This necessitates logic to predict and deal with masking this cost, thereby further leading your down the rabbit-hole of hemorrhaging logic on non-computational areas with decreasing benefit.

DeadmeatGA · Nov 6, 2003

...

I understand that Sony Cell and IBM Cell are not the same, but you still haven't gave me any comparisons on how they different.

There are three identified IBM Cellular architectures.

1. Blue Gene Cyclopse

This one uses a radical SMT processor design to scale. It keeps 32 active threads on core but runs only 1 thead at a time. The first contender of Blue Gene design competition.

2. Blue Gene L

This is the second entry to IBM's internal Blue Gene design competition. It uses twin PPC core per node design, one running the OS and the other dedicated to computing.

3. STI Cell(aka Sony Cell)

This architecture appears to be a modification of Blue Gene L, in which the compute engine is replaced with custom vector units(aka APU) to boost floating point performance. The way thing works is pretty much identical to Blue Gene L, but the number of APUs can be varied to meet particular performance goal.

Is it as DM says and that Sony won't release a 2 Tera FLOP capable part for PS3

PSX3 cannot possibly reach a teraflop; it costs too much. By Kutarai's own presentation, you need to pack hundreds of Cell processors on a rack to attain "Greater than 1 Teraflop" performance.

or is it that Sony's PS3 is more capable than even IBM's Blue Gene? I'm totally confused.

If SCEI CELL was more capable than Blue Gene, then there is no point in continuing with Blue Gene program in the first place. But IBM continues, which should tell you something about how IBM feels about SCEI CELL.

Panajev2001a · Nov 6, 2003

I understand that Sony Cell and IBM Cell are not the same, but you still haven't gave me any comparisons on how they different. Just remember I didn't understand any of the technical mumbo-jumbo you just posted. I just want to be able to figure out what FLOPS, cores and APUs the Sony version will have and how it relates to these slides. Is it as DM says and that Sony won't release a 2 Tera FLOP capable part for PS3 or is it that Sony's PS3 is more capable than even IBM's Blue Gene? I'm totally confused.

Tommy McClain

IBM CELL = SCE CELL = Toshiba CELL = STI CELL

( STI = Sony+SCE [now SSNC], Toshiba and IBM: they have a center in which they work together in Austin, TX )

So, Sony CELL and IBM CELL are the same basic architecture: if you see Suzuoki ( from SCE ) CELL patent and you see the IBM patents about APU, PE, multi-PEs you will see that they are talking about the same thing: a good suggestion is that Suzuoki's patent reppresent a final draft regarding the CELL concept and its basic design principles and goals.

What we know about BlueGene shares some ideas with CELL, but fuinademtally is a different and separate thing.

The Suzuoki's CELL patent defined the APU to have a SIMD structure with 4 groups of FP/FX Units and this is the same we see on an IBM patent regarding the APU previously published: the unit is capable of scalar or vector operations.

The peak is 1 Vector Operation ( either FP or FX ) per cycle per APU: if this is a MADD ( Multiply-Add ) operation this means 2 operations/cycle ( MADD is something like R1 = R2 * R3 + R4; ).

Suzuoki's patent rated the APU at 32 GFLOPS.

IBM had previous work on Cellular architecture concepts, CELL is their latest one.

Cellular Computing and CELL are not synonims.

zidane1strife · Nov 6, 2003

The past...

some expected... ps2 perf to be 5-10m verts... some expected psp to have psone level gphx... some expected nvidia to keep beating ati... some did not...

The present...
some expect 1Tflops... but there are some that do not...

but a glimpse of the future... might be given... in early 2004...

Panajev2001a · Nov 6, 2003

If SCEI CELL was more capable than Blue Gene, then there is no point in continuing with Blue Gene program in the first place. But IBM continues, which should tell you something about how IBM feels about SCEI CELL.

STI CELL != Sony CELL

IBM also keeps a lot of internal architectures in house: Mainframes, POWER, PowerPC, x86 ( they have succesful server lines based on Intel x86 CPUs ) and in the future x86-64 and probably even IPF.

Adding CELL to their IP bag is not a problem and they do plan to use CELL

BlueGene was not designed for ultra fast Single Precision FP processing and multi-media which is what STI is working for.

APUs are not a SCE invention like you seem to sustain, but were being worked on before PlayStation 2 was even released on the market.

APUs are not FP processor either and they work both on Scalar and Vector operations ( FP and FX operations ).

glw · Nov 6, 2003

Re: ...

DeadmeatGA said:
1. Blue Gene Cyclopse

This one uses a radical SMT processor design to scale. It keeps 32 active threads on core but runs only 1 thead at a time. The first contender of Blue Gene design competition.

Up to 32 threads run at once, one executing thread per thread group,
the current spec can run 32 threads simulataneously. The number
of threads being processed is much larger, 256 in the reference
design.

Each thread group shares a 64 entry register file, a program counter,
ALU, instruction sequencer, an FPU and data cache (16 kB).
Instruction caches (32 kB) are shared by two thread groups.

There are 16 banks of 512 kB DRAM, with a bandwidth of 40 GB/s.
Alternate DRAM designs allow for up to 160 GB/s.

Any thread can issue on any cycle if execution resources allow,
if more than 1 thread tries then execution is scheduled in a
round-robin fashion.

The ISA is a 3-operand load-store architecture using 60 of the
most common PowerPC instructions with multi-threading extensions.

At 500 MHz a Cyclops chip peaks at 32 GFlop/s.

Cyclops is a precursor to the final architecture for the BlueGene/P
machine which Cell may be related too. It's not a big jump to
assume Cell will be well north of 32 GFlop/s.

DeadmeatGA · Nov 6, 2003

...

STI CELL != Sony CELL

You puzzle me greatly.

Cyclops is a precursor to the final architecture for the BlueGene/P
machine which Cell may be related too.

Cyclops is not a final precursor to BlueGene/P; it is just one of two(or three) possible candidates at this point. STI is indeed directly based on Blue Gene/L, however.

DeadmeatGA · Nov 6, 2003

...

Local

Real world data processing is rarely local. Hell, at least 50% of variables I use in a functional definition is referenced from the previous functional block's stack frame. You pretty much pass everything by reference/pointer except for char, int and float, making your code highly dependent on the state of whatever the function that called your code.(This is why I want all C-derivative languages dead)

Parallel

Easier said than done. You actually get used to it after a while, but I would avoid it if I could.

Simple

Simple for hardware engineers, a nightmare for coders.

You say F-you to the high superscalar with hundreds of instructions, long pipelines, a large area % utilized by prediction routines and such

Why say F-you to such processor? That processor makes a poor coder's life a lot easier.

instead you choose a paradigm that takes a simple pipeline with a reductionist instruction set and significant embedded RAM and create a fundamental construct, a Cell.

And I can't imagine how you would program for such a beast. Tim Sweeney shares the same feeling too.

that x86 is a tremendous waste of logic and area when all is said and done

But it still gives the best performance/buck.

Now, STI's Cell vis-Ã -vis the Suzuoki Patent is a different beast all together.

No it is not. It simply replaces BlueGene/L's PPC compute core with APUs.

you need to revert to dense/coarse-grained constructs of dedicated logic - ergo the APU.

Or actually invest in auto-parallelization compiler like Sun is doing. I don't like IBM's approach of "Here is our BlueGene message passing API, now UDO it". And I despise Kutaragi Ken's approach of "We don't even have a compiler and basic library, but here is an assembler so tha you can write something in time for launch in 4 months".

Vince · Nov 6, 2003

Re: ...

Dude, just STFU... you give me a headache it hurts that much to read you.

DeadmeatGA · Nov 6, 2003

...

Dude, just STFU...

Saying STFU to another member deserves a ban. Nice knowing you, Vince.

Vince · Nov 6, 2003

Re: ...

DeadmeatGA said:
Saying STFU to another member deserves a ban. Nice knowing you, Vince.

Right. Give me a logical argument to fight and I'll do it. You're posting utter rubbish; "I despise Ken..." Yeah, great, go have fun with his picture on your own time.

To paraphrase Top Gun, "Son, your ego is writing checks your body can't cash" - then again... neither do they pan out in reality.

For example, your comments are so utterly obtuse:

DMGA said:
But it [x86] still gives the best performance/buck.

And if there was 500million+ EmotionEngine derivatives out there, you can bet it's costs would normalize. And why are we even debating costs?

Lets talk about preformance per area. Ohh no, we can't talk about something that shows STI in a [massivly] good light. The horror!

Look at how you responded to me, you took out single words and wrote BS responces that are made in total ignorance as we - we as in all of us - have no idea how it'll be handled on Cell/PS3. Get a life.

Megadrive1988 · Nov 6, 2003

around 1995 Sony planned to introduce a new Playstation every 3-4 years.

Playstation 2 would have been out in 1997 or 1998 to combat the 3DO M2 and 3DO MX as well as whatever Sega could offer. this PS2 would have had performance in the 800,000 ~ 2,000,000 million triangle per second range.

Playstation 3 was to be out in 2001, probably to combat the 3DO M3 and Nintendo N2000. that PS3's performance was unknown but it is likely that *that* PS3 would have been close in power to what *todays* PS2 is, the one that 60 million gamers all over the world have., since we are talking 1 year of difference.

So our PS2 is *probably* the old PS3 that was planned around 1995.

this:

(which I tend to believe as he patented the EE in 1997)

supports the above theory IMHO.

Fafalada · Nov 6, 2003

DM,
that slide is more then 2 years old(closer to 4 if I'm not mistaken), and it's a BlueGene presentation. The Sony pdf simply reused the same picture. So unless Cell is 100% identical(and you already admitted it's not) to BlueGene, I fail to see how you can use those numbers to proove anything.

This architecture appears to be a modification of Blue Gene L, in which the compute engine is replaced with custom vector units(aka APU) to boost floating point performance.

I don't think so. APUs are described as general purpose - including integer throughput equivalent to the FPU one. Maybe console variation will be different but that takes away from your line of reasoning about the patent again.

Real world data processing is rarely local.

Last I checked high data locality is what makes all the graphics processors out there work.
Now maybe you want to discuss problematics outside game consoles but that's not really a topic fitting this forum I think.

Btw, all this aside, you already argued feverishly that Cell will not be in PS3 at all (unless I misread your arguments) so why bother downplaying the architecture still?

Kutaragi Ken:Want a teraflop? You have to buy a rack from us

Similar threads