CELL Patents (J Kahle): APU, PU, DMAC, Cache interactions?

PC-Engine said:
MfA said:
A 65 nm Cell processor with the same die size as the 250-nm EE running at 6 GHz? (Assuming most of it isnt eDRAM.) I would guestimate 250 Watt plus.

Uh..MfA, haven't you heard of the 81GHz diamond transistor? It'll be in PS3 for sure. :LOL: :p

Sony will then be able to market the PS3 as the "console with the bling"... Will go down well with GTA:SA fans.......
 
MfA said:
A 65 nm Cell processor with the same die size as the 250-nm EE running at 6 GHz? (Assuming most of it isnt eDRAM.) I would guestimate 250 Watt plus.

My semiconductor physics isn't great but I'll have ago ;)

The EE was 15W and the GS was 10W @ 1.8V

http://www.beyond3d.com/forum/viewtopic.php?p=360520#360520

The densest transistor wise is the GS with ~ 43 Million transistors at 25nm on a 279mm^2 die @ 1.8 Volts, 10W, 150Mhz...so I'll use it as an example...

If Power is proportional to Capacitance (?), Frequency (150Mhz) and Voltage^2 (1.8V),

150MHz GS = 10 W @ 1.8V

6GHz GS = 400 W @ 1.8V

I'm not sure voltage at 65nm/45nm would be...I read somewhere it would be around ~ 0.8V but please correct me if I'm wrong. Let's assume 0.9V (a half of 1.8V)...

6GHz GS = 400* (0.9/1.8 )^2 = 400/4 = 100 W @ 0.9V

I'm not sure how to deal with the capacitance here, any suggestions? There would be more trannies but they're charges would be smaller, no? So I'll assume it a constant.

6Ghz GS = 100W

4Ghz GS ~ 67 Watts at 65nm, 0.9 Volts

So the guestimate CPU/BE ~ 67 Watts @ 65nm, 0.9V, GS die size, 279 mm^2

This is less than todays top end CPUs at 130nm! :p Any thoughts?


Vince said:
...
I question how much of a difference there is once you had IBM come on board in 2000. From what I've heard, of the 10 principles on Cell, only 2 were from Sony. You be the judge.

How many were from Toshiba?

ultimate_end said:
....
I have a feeling however that Sony's cell may not necessarily end up being the exact same as IBM's cell. Or not.
...

They probably wouldn't. They only need to share the same CELL ISA in the same way Intel and AMD etc. share x86 ISA.

ultimate_end said:
...
Now from what I have read (Ask Paul, he will point you in the right direction), Sony intends to use XDR and Redwood interface technology not only with the broadband engine, but also in the Broadband Engine! Now you can draw all of your own conclusions from that.
...

Where did you guys read that?

ultimate_end said:
...
Judging from Masakazu Suzuoki's Broadband engine/Visualizer combination, I would be willing to wager that Physics/animation processing capability bas been very much at the front of people's minds at Sony.

They like to call it 'World Simulation" these days! :) I remember seeing a demo video of this (PS3 related) with some girl paddling in an ocean and interacting with a flock of birds flying around her but I can't seem to find it anymore! :?

Anyway this is what they (Okamoto) had to say at GDC2002,

http://archive.gamespy.com/gdc2002/okamoto/

nAo said:

Have you found any other ammendments in there? I've had a quick scan but couldn't 'see' any apart from the assignee's? :?

qwerty2000 said:
Hey Jaws

It seems you know what you're talking about can you give me your estimate on ps3 specs and try be realistic as you can be please. Thanks in advance.

:) I only know as much as the next man! You have to remember pretty much most of this is speculation ( with foundations of course ;) ) and very little official news is known. But FWIW, I don't think it wll be that far off that diagram! :p
 
Jaws said:
6Ghz GS = 100W

4Ghz GS ~ 67 Watts at 65nm, 0.9 Volts

So the guestimate CPU/BE ~ 67 Watts @ 65nm, 0.9V, GS die size, 279 mm^2

This is less than todays top end CPUs at 130nm! :p Any thoughts?

What about leak current? Precott has it as 20-30% of its TDP, but not sure about low-k etc. applied CPUs.
 
I just doubled the 90 nm 970FX powerconsumption and included a little extra for the fact that the frequency is higher than simple scaling rules would accomplish for 90->65 nm ... why do I think that is reasonable? Instinct :) To me it doesnt seem worth the trouble trying to apply scaling rules of thumb, with all the unfounded assumptions you need before you can even whip out the arithmetic it seems a waste of time.
 
Jaws said:
The densest transistor wise is the GS with ~ 43 Million transistors at 25nm on a 279mm^2 die @ 1.8 Volts, 10W, 150Mhz...so I'll use it as an example...
A good chunk of those transistors are used for the eDRAM.
But OTOH, you didn't applied Low-K and SOI into your example, so... ;)

Also, we have to consider the fact that neither the GS nor the EE were designed with high frequencies in mind.
Cell is definately created with high frequencies in mind, judging by the patents and the people behind the project, alone. IMHO.
The only question is will they succeed to achieve really high frequency for the PS3 version of Cell?
 
Jaws said:
This is less than todays top end CPUs at 130nm! :p Any thoughts?

Yeah.
Most of GS is framebuffer transistors that don't dissipate much power. A future PS3 GPU is sure to contain A LOT more computing elements (an aspect where the current GS is rather strapped for resources, it doesn't even have a full range of blend modes).

Using current tech and extrapolate to show how much power a future chip will draw is Deadmeat reasoning that I don't believe actually works. Let's just wait and see what Sony comes up with... Whatever it is, it's sure to work. :)
 
Most of GS is framebuffer transistors that don't dissipate much power. A future PS3 GPU is sure to contain A LOT more computing elements


mmmmmmmm PS3 GPU computing elements....ahhggggggg *Drool*

can't wait to find out what's inside this beast :)
 
Jaws said:
ultimate_end said:
...
Now from what I have read (Ask Paul, he will point you in the right direction), Sony intends to use XDR and Redwood interface technology not only with the broadband engine, but also in the Broadband Engine! Now you can draw all of your own conclusions from that.
...

Where did you guys read that?

I have found the link.
Toshiba/SCEI/Rambus Contract excerpt

Jaws said:
ultimate_end said:
...
Judging from Masakazu Suzuoki's Broadband engine/Visualizer combination, I would be willing to wager that Physics/animation processing capability bas been very much at the front of people's minds at Sony.

They like to call it 'World Simulation" these days! :) I remember seeing a demo video of this (PS3 related) with some girl paddling in an ocean and interacting with a flock of birds flying around her but I can't seem to find it anymore! :?

Anyway this is what they (Okamoto) had to say at GDC2002,

http://archive.gamespy.com/gdc2002/okamoto/

Oh yes, Shinichi Okamoto's famous (infamous?) "we need 1000 times the power of PS2!" talk. Thanks for the link BTW. I just wish that someone from sony would come out and clarify just what exactly 1000x the power of x console means exactly :) .
Yes "World Simulation" sounds about right. I can just imagine the SCE marketing department having a field day with this one! :LOL: Even Polyphony Digital's Kazunori Yamauchi has started using that phrase lately.

That demo video certainly sounds interesting. Anything like that would give some kind of idea of the future, Macrosoft's XNA crash demo shows just the tip of the iceberg IMO.

This probably needs its own thread, but I think that too many people underestimate the impact that physics and animation will have next gen. When looked at with a critical eye, it's too easy too see just how unrealistic today's games look; what with all the motion and skinning deficiencies, simple cycles of animation, cheap "ragdoll" physics engines and such. What will we do when graphics reach a point where most people can't tell the difference beteen two game consoles that may actually be quite disparate "power" wise? There is massively more to visuals than just rendering etc. Imagine a game world where almost everything interacts with everything else realistically. A lot of that is faked these days, but to be calculated in real time is quite a leap forward in processing capability. One day we will look back at today's games and think they are hideous. And I really mean hideous!

Will this be possible with PS3? I'm not sure. Hell we don't even know if PS3 will even have a Broadband engine, let alone a TeraFLOPS Broadband Engine. But I like to say "maybe". I have a feeling we are about to be blown away ;) .

Anyway, I hope some people look at those two patent applications. I could be wrong, but I don't think they have been brought up on these forums before and AFAIK they are the only cell related patents that refer to clock speed.
I hope these links work: Multiphase clocking method and apparatus and Microprocessor chip simultaneous switching current reduction method and apparatus

As for the PS3 GPU, everybody should note that different parts of that chip may well run at different clockspeeds. This is something Sony likes to do, as can be seen by the new EE+GS chip: where the graphics synthesizer runs at its normal ~147 MHz and the emotion engine runs at its normal ~294 MHz. This is imprtant to note as, just like in an old-school (lol) PC, there is no T&L performed on the PS2's graphics chip and so the extra clock speed of the EE helps dramatically improve performance. Assuming for a moment that Sony use the multi-Visualizer chip shown in the patent, we could assume (for a moment) that the PEs etc would run at say 4 Ghz and the Pixel Engines would run at (for example) 1 Ghz. Assuming T&L was performed on the Visualizer, PS3 would have some very tasty polygon counts!
The PSP's chip is another example of a multi-clockspeed device that Sony makes...

But anyway, I had better stop there. I have some Covenent honour guard to deal to :D .
 
Oh yes, Shinichi Okamoto's famous (infamous?) "we need 1000 times the power of PS2!" talk. Thanks for the link BTW. I just wish that someone from sony would come out and clarify just what exactly 1000x the power of x console means exactly
That one's easy, using a polite description, it's called 'poetic license'. :p
 
James Kahle said:
"We've done a lot of work in the design center for proof of concept," he said. "I think the original 'Cell' vision was not (for) any one product."

Kahle said it was too soon to talk about some of the production specifics of "Cell," like the manufacturing process that would be used to make the chip or how soon it will be coming off production lines in volume.

But he said the chip would address many of the problems inherent in chip-making today, such as the difficulty of producing processors with smaller and smaller features, while keeping down their power requirements and heat output.

He also said he was spending 20% to 30% of his time thinking about products to follow up on "Cell," which is built to be reconfigured easily and without extensive redesign of the hardware itself.

"We're being fairly general purpose about it," he said.
 
ultimate_end said:
Also note that the Cell is almost a natural evolution of Emotion Engine which was designed by Toshiba.

Oh I don't doubt that Sony would have gone ahead with a massively parallel architecture, even without IBM's help. Kutaragi and Co absolutely love the idea. And there are some similarities in philosphy between cell and the EE.
I wouldn't say so much that Cell is the natural evolution of the EE, but rather PS3 will be a natural evolution of the EE. The EE's VU0 was originally intended to be a physics/animation co-processor. That was very important. Judging from Masakazu Suzuoki's Broadband engine/Visualizer combination, I would be willing to wager that Physics/animation processing capability bas been very much at the front of people's minds at Sony.

How about this:

EE MIPS core -> SGI Origin 3000 GSCube host -> Cell PU
EE asymmetric VU -> EE+GS GSCube version -> Cell symmetric APU
 
Add this...

"But he said the chip would address many of the problems inherent in chip-making today, such as the difficulty of producing processors with smaller and smaller features, while keeping down their power requirements and heat output." James Kahle

With...

"Toshiba and Sony have utilized 65-nm process to fabricate an embedded DRAM with a cell size of 0.11um2, which will enable a 256-megabit memory to be integrated on a single chip. It also fabricated the world's smallest embedded SRAM cell of only 0.6um2.....
.... the companies described some of the details of the process, including the development of a high-performance transistor with a 30-nm gate length.

Fabricated with 193-nm lithography tools and phase-shift photomasks, the transistor is said to have switching speeds of 0.72-ps for NMOSFET and 1.41-ps for PMOSFET at 0.85-Volt (Ioff=100nA/um).

The transistor makes use of a nitrogen concentration plasma nitrided, oxide-gate dielectrics to suppress gate leakage current. This optimization reduces leakage current approximately 50 times more efficiently than conventional silicon dioxide film and allows formation of an oxide with an effective thickness of only 1-nm.

To reduce wiring propagation delay and power dissipation, a low-k dielectric material is adopted. The target effective dielectric constant of the interlayer dielectric is around 2.7. " siliconestrategies.com

Looks like STI has put in some extensive work. Looks very promissing!
 
Note that one way to solve power consumption problems is simply to use more processors spread over a larger die at a lower clock speeds.
 
Note that one way to solve power consumption problems is simply to use more processors spread over a larger die at a lower clock speeds.

Or high frequency but cycling through them, on and off.
 
ultimate_end said:
Jaws said:
ultimate_end said:
...
Now from what I have read (Ask Paul, he will point you in the right direction), Sony intends to use XDR and Redwood interface technology not only with the broadband engine, but also in the Broadband Engine! Now you can draw all of your own conclusions from that.
...

Where did you guys read that?

I have found the link.
Toshiba/SCEI/Rambus Contract excerpt

Thanks for link...It's a shame we can't read the whole doc! :p

ultimate_end said:
...
Anyway, I hope some people look at those two patent applications. I could be wrong, but I don't think they have been brought up on these forums before and AFAIK they are the only cell related patents that refer to clock speed.
I hope these links work: Multiphase clocking method and apparatus and Microprocessor chip simultaneous switching current reduction method and apparatus

As for the PS3 GPU, everybody should note that different parts of that chip may well run at different clockspeeds...

Btw, those patents were discussed in this thread but the patent links are broken (looks like most of these old threads have these broken links! ;) ).

Anyway, yeah I scanned it before and they're very important patents. It's quite conceivable that each PE could be on a ring bus on the BE (obviously they don't have to look like a ring but just a closed loop.) Also the multi-clock for various components on the IC's has always intrigued me, especially on how they would tacke the GPUs PUs, APUs, EDRAM, PixelEngines, buses etc...

The reduction of total circuit induction by reducing di/dt is basically a smoothing circuit to reduce peak current draw. This would reduce the capacitance and hence the Power/Heat dissipation...nice! :p That alongside all the exotic components as quoted by Mythos for the 65nm process amongst other things to reduce current leakage etc. would set them off nicely in the right direction! :p

one said:
How about this:
...
EE MIPS core -> SGI Origin 3000 GSCube host -> Cell PU
EE asymmetric VU -> EE+GS GSCube version -> Cell symmetric APU

I'd go with something more like this,

The ONYX 3400 had 32 CPUs, the BE has 32 APUs...
The GScube had 16 EEs, the GPU (4VSs) have 16 APUs...
The GScube had 16 GSs, the GPU (4VSs) have 4 PixelEngines

So the ONYX + GSCUBE = BE + GPU (4VSs) ...

http://www.beyond3d.com/forum/viewtopic.php?p=366780#366780


BTW guys, the 67 Watts for the BE was just a simple guestimate to show that it's in the realm of 'possibility' and not f%$K, that's impossible realm! :p

And Guden, "...Deadmeat reasoning..." ! :D Oh please...MfA started it!

*Runs and Hides*
 
Just been looking at this new 'Cross Bar patent' (thanks to 'one' for link :) )

CROSSBAR SWITCH, METHOD AND PROGRAM FOR CONTROLLING OPERATION THEREOF

And it looks like it's linked to the Salc/Salp patent,

Serial operation pipeline, arithmetic device, arithmetic-logic circuit and operation method using the serial operation pipeline


disussed in this thread . (Also another Pixel Engine candidate here...) Basically the Salc/Salps look like very good candidates for the Pixel Engines in the CELL chipset diagram likely for PS3 below,

BE_VS.jpg


Now these Pixel Engines would be formed from a 2D parralel cascaded array of Salc/Salps (serial bit ALUs) that form a general fully programmable/ variable stage pipelines. Sort of a 'microcosm' of CELL architecture but for pixel level granularity. It was discussed extensively in the above thread. However there was concern from developers about the lack of TMUs (texture memory units) and the latencies involved.

This new 'crossbar' patent is assigned to 'JUNICHI' who also did the Salc/Salp patent and he describes the crossbar circuit that connects to the Pixel Engine aswell as the much needed TMUs described as 'Buffer' below, (they would fit nicely with the 'image cache' in the Cell diagram)

PixelEngine-cbar-tmu.jpg


and the 'pixel pipeline processor' would be the 256 Salps below, (32 Salc=1 Salp and 256 Salp= 1 Pixel Engine = 256 32bit Flops/Ops per cycle)

pixelengine.jpg


This would be one Pixel Engine and the Cell diagram shows 4 Pixel Engines.

4 Pixel Engines would be 1024 32bit Flops/Ops per cycle
16 Apus would be 128 32bit Flops/Ops per cycle
32 Apus would be 256 32bit Flops/Ops per cycle

Total GPU = 1152 32bit Flops/Ops per cycle
Total CPU = 256 32bit Flops/Ops per cycle

The CrossBar patent extensively discussed the manufacturability and operation of the Crossbar but also mentions this about the 'serial operation pipeline',

Fig. 1 is a structural diagram illustrating essential components of this image rendering device. In order to provide data necessary for image rendering processing, the image rendering device comprises a buffer (memory buffer) 1 as an example of a semiconductor device, a pixel pipeline processor 2 having a serial operation pipeline, and a crossbar switch 3 connected between the buffer 1 and the pixel pipeline processor 2 via interface components.

The buffer 1 is a data memory storing the above-mentioned data that are to be transmitted to the crossbar switch. In the present embodiment, as an example, data such as

CLUT (Color Look Up Table) and texture for generating the entire color and pattern on a polygon by mapping are stored. The CLUT comprises a table for the three primary colors, R (red), G (green), and B (blue) and ana value table. The table for the three primary colors, R, G, and B is used for determining the color of each pixel of the texture, and the a value is a coefficient value for determining for each pixel the blend(a blending) ratio of images when the texture is mapped, that is, a coefficient value representing semi-transparency. The index for picking up the three primary colors R, G, and B from the CLUT (the value for specifying the table number in the CLUT) is defined for each pixel represented by the XY coordinates of the texture.

The pixel pipeline processor 2 conducts reading of data from the buffer 1 and also conducts texture mapping, comparison of Z coordinates, and pixel value calculations by a pipeline system.

The pixel pipeline processor 2 also conducts the processing of extracting the edge of image brightness, the processing of picking up the data for the three primarycolors, R, G, andB from the CLUT according to the texture index and setting the color of each pixel, and thea blend processing using the a value (graded a value) picked up from the CLUT by using byte values of each pixel ina G plane as an index. Furthermore, the pixel pipeline processor 2 conducts processing such as scissoring, dithering, and color clamping.

Scissoring is a processing technique for deleting data that fall outside of a screen, dithering is a processing technique for incorporating the arrangement of colors for representing a large number of colors with a small color palette, and color clamping is a processing technique employed during color computation for limiting the value thereof so that it does not exceed 255 or does not become less than 0.

Data obtained by conducting the above-mentioned processing in the pixel pipeline processor 2 are stored in a frame buffer (not shown in the figures) and then converted into frame data (two-dimensional image data) rendered on a two-dimensional monitor screen.

Those frame data are then read out from the frame buffer, produced from an output terminal and sent to a two-dimensional monitor unit.

So what do you guys think of this new addition of TMUs and hiding latencies from the Salc/Salps? I know Pana was concerned about this?! :p
 
Jesus, Jaws... Give it a rest already! We'll find out in six months at next E3. You trying to patch together a ton of mismatching patents into a coherent single picture isn't going to work, it just makes you look like an obsessed loonie. :)
 
Guden Oden said:
Jesus, Jaws... Give it a rest already! We'll find out in six months at next E3. You trying to patch together a ton of mismatching patents into a coherent single picture isn't going to work, it just makes you look like an obsessed loonie. :)

Isn't that the fun of it? :D
 
Back
Top