Details trickle out on CELL processor...

Hmm, even the location of the dev facility matches (Austin, TX). As I wrote in my last comment, Sony/Toshiba's agenda was how they could get access to the know-how of high-clock speed, high-performance semiconductor technology. Toshiba's highest performance CPU in today is TX99 and it's 800MHz. Toshiba can reach a new market by getting Cell in their lineup and can learn advanced microprocessor technology.

I can imagine Sony and Toshiba planned in 1999 about how to get access to this high-clockspeed processor technology - possible candidates were IBM and Intel, so the choice was very easy I guess - consult advanced microprocessor geniuses in IBM and get SOI.
 
one said:
Hmm, even the location of the dev facility matches (Austin, TX). As I wrote in my last comment, Sony/Toshiba's agenda was how they could get access to the know-how of high-clock speed, high-performance semiconductor technology. Toshiba's highest performance CPU in today is TX99 and it's 800MHz. Toshiba can reach a new market by getting Cell in their lineup and can learn advanced microprocessor technology.

I can imagine Sony and Toshiba planned in 1999 about how to get access to this high-clockspeed processor technology - possible candidates were IBM and Intel, so the choice was very easy I guess - consult advanced microprocessor geniuses in IBM and get SOI.

I assume at 800mhz it outperforms a desktop processor like the g4/g5, p3/p4, or athlons at 800mhz?

And if mhz was the primary concern wouldn't intel be a better partner? Only recently with the G5 has IBM gotten up to decent mhz speeds(in consumer level hardware anyhow), whereas Intel was up to 3.2ghz by that point and up to 3.8ghz now, and with water cooling would probably be getting close to 5 ghz like I think they had planned to by now. Twice the speed of the fastest G5 I know of, and I'd assume a much better performer in just about everything.
 
Fox5 said:
I assume at 800mhz it outperforms a desktop processor like the g4/g5, p3/p4, or athlons at 800mhz?

Toshiba's processors are for embedded use such as high-end printers and set-top boxes, so not that power-hungry and not that powerful.

Simply put, Toshiba didn't have a customer which needed high-performance chips unlike Fujitsu that designs SPARC64 for the server market. Sony's demand of Emotion Engine was the first signal of Toshiba's entry in making a high-performance MPU. Likewise, IBM can't make a new processor without a customer.

As for Intel, well I shouldn't put it in 'possible candidates', as it can be hardly called an IP company... :p
 
Brimstone said:
Guden Oden said:
The SPU is 1.15 Ghz, not the 4.6 Ghz.

What makes you so sure?

That is an incredible clock speed. I don't think there is anything even remotely close to that speed in DSP's, CPU's, or whatever.

So what about the below?

20.1 An 8GHz Floating Point Multiply
8:30 AM
W. Belluomini, D. Jamsek, A. Martin, C. McDowell, R. Montoye,
T. Nguyen, H. Ngo, J. Sawada, I. Vo, R. Datta

IBM, Austin, TX

The implementation of the mantissa portion of a floating-point multiply (54x54b) is described. The 0.124mm2 multiplier is implemented using limited switch dynamic logic and operates at speeds up to 8GHz in a 90nm SOI technology. The multiplier dissipates between 150mW and 1.8W as it scales between 2GHz and 8GHz.

Guess it must be all make believe?
 
Cryect said:
Brimstone said:
Guden Oden said:
The SPU is 1.15 Ghz, not the 4.6 Ghz.

What makes you so sure?

That is an incredible clock speed. I don't think there is anything even remotely close to that speed in DSP's, CPU's, or whatever.

So what about the below?

20.1 An 8GHz Floating Point Multiply
8:30 AM
W. Belluomini, D. Jamsek, A. Martin, C. McDowell, R. Montoye,
T. Nguyen, H. Ngo, J. Sawada, I. Vo, R. Datta

IBM, Austin, TX

The implementation of the mantissa portion of a floating-point multiply (54x54b) is described. The 0.124mm2 multiplier is implemented using limited switch dynamic logic and operates at speeds up to 8GHz in a 90nm SOI technology. The multiplier dissipates between 150mW and 1.8W as it scales between 2GHz and 8GHz.

Guess it must be all make believe?

Does this all tie in nicely with asynchronous clock speeds on chip as rumored or mentioned in the past with these emerging techs, even Cell? Or am I mixing this up with something else?
 
Fox5 said:
Brimstone said:
This will replace the Power PC 400 series they sold off to Applied Micro Devices.

What's AMD going to do with a PowerPC series? Rip it apart and copy the design?

Umm, Brimstone should have said Applied Micro Circuits or AMCC. Not the same company as Applied Micro Devices or AMD.

http://www.amcc.com/

http://www.amd.com/

[EDIT]
Oops. I posted before seeing his correction. :oops:
[/EDIT]

Tommy McClain
 
Cryect said:
Brimstone said:
Guden Oden said:
The SPU is 1.15 Ghz, not the 4.6 Ghz.

What makes you so sure?

That is an incredible clock speed. I don't think there is anything even remotely close to that speed in DSP's, CPU's, or whatever.

So what about the below?

20.1 An 8GHz Floating Point Multiply
8:30 AM
W. Belluomini, D. Jamsek, A. Martin, C. McDowell, R. Montoye,
T. Nguyen, H. Ngo, J. Sawada, I. Vo, R. Datta

IBM, Austin, TX

The implementation of the mantissa portion of a floating-point multiply (54x54b) is described. The 0.124mm2 multiplier is implemented using limited switch dynamic logic and operates at speeds up to 8GHz in a 90nm SOI technology. The multiplier dissipates between 150mW and 1.8W as it scales between 2GHz and 8GHz.

Guess it must be all make believe?

So Texas Instruments DSP market is destroyed in one fell swoop?
 
Doubt TI's DSP market is going anywhere at the moment considering they are cheaper in general than IBM's stuff.

Also, who knows what that multiply unit is for Jov. Could be CELL but doubt it since everything else CELL related seems to be labeled but still good to see that the same research spot for IBM is developing processing units that are going up to 8GHz.
 
Cryect said:
Doubt TI's DSP market is going anywhere at the moment considering they are cheaper in general than IBM's stuff.

Also, who knows what that multiply unit is for Jov. Could be CELL but doubt it since everything else CELL related seems to be labeled but still good to see that the same research spot for IBM is developing processing units that are going up to 8GHz.

The best Texas Instruments has is 1 Ghz on the .90 nm node. So "CELL" with the "SPU's" will run at 4.6 Ghz? That is a huge performance gap and "SPU's" are more flexable. Now Texas Instruments DSP's compared to "CELL" might be polar opposites when it comes to cost. Probably a lot of other things I don't understand, but I'm still very skeptical.

Texas Instruments NOV 8 2004 The World’s First 90nm DSPs Running at 1GHz are Now in Volume Production
 
brimstone said:
The SPU is the new Power PC 300 series as that was reported as a rumor a while back on Apple Insider in December of 2003.
S/APUs are basically a fancy version of VUs - they are by no means intended as standalone processors, and I doubt they have much to do with any PPC at all.

The controlling processor (aka PU) is the one that was always speculated to be derived from PPC series, and now confirmed. Which series exactly I wouldn't know, but maybe it is the 300 you mention here.

As for clock speeds, personally I always expected S/APUs to run at full speed(and we knew preferred embodiment was 4ghz before), while PU possibly not. Though that will largely depend on just what kind of core the PU is, and we still don't know that.
 
Brimstone said:
but I'm still very skeptical.

ISSCC is the most authoritative conference in the semiconductor academic society and a paper without an actually running sample is not accepted. Don't confuse it with self-proclaimed press releases.
 
one said:
ISSCC is the most authoritative conference in the semiconductor academic society and a paper without an actually running sample is not accepted. Don't confuse it with self-proclaimed press releases.

I know, thats why I find these 'it can't be' conspiracy theorists to hard so understand. They can't just say they have a 4.6GHz working 90nm SOI Cell or a 4.8GHz SRAM operating in a S|APU or an 8GHz FPU and the IEEE is just going to say, 'Oh, ok, whatever you say... we trust you.

Fafalada said:

I've been using S|APUs for some time as it looks more badass, please conform. :p

Pana said:
The SPU (in CELL related IBM literature and patent portfolio) would be the Synergistic Processing Unit, another name for the APU we have seen in the CELL patents which is as patents say a versatile "stream processor with 4-way SIMD engine"

I've heard that an Synergistic PU is an APU 'core' with a local flow-controller and some sort of cache. Something alone the lines of what I stated here. We'll see if it turns out to be true?
 
Do you (yes..you little cell fanb0y ;) )want to know if APUs can issue asynchronous requestes to the DMA controller?
Then you have just to read it here:

Method for asynchronous DMA command completion notification
The present invention provides for asynchronous DMA command completion notification in a computer system. A command tag, associated with a plurality DMA command is generated. A DMA data movement command having the command tag is grouped with another DMA data movement command having the command tag. DMA commands belonging to the same tag group are monitored to see whether all DMA commands of the same tag group are completed.

ciao,
Marco

[EDIT] I bet Fafalada will find very interesting this patent..do you remember our discussions about APUs acting like TMUs? :)
 
Brimstone said:
The best Texas Instruments has is 1 Ghz on the .90 nm node. So "CELL" with the "SPU's" will run at 4.6 Ghz? That is a huge performance gap

WTF is this?! You're again adding 1 and 1 and believing 5 is the correct answer. What does TIs 90nm manufacturing have to do with IBMs or Sony-Toshiba's? What does TIs DSPs have to do with Cell?

Answer: big, fat NOTHING!

Because of a TI DSP, you believe DM's ass-numbers are correct? That's nutty, man. Instead of a silly DSP, why not consider the prescott P4, which regularly overclocks to 4-ish GHz, with its double-clocked ALUs running at 8. What do you say about that, huh?
 
A lot of very interesting quotes from that patent:
when the APU generates and queues a DMA command over bus to the DMAQ, the APU attaches a tag group indicia as well. The tag group indicia indicates to which specific collection or group of commands the DMA command belongs.
Once in the DMAQ , the commands are ordered to be executed by the DMA engine . These commands are transmitted to other devices over the command bus . Once the commands have been executed, and the DMA engine has been so notified of its completion over the command bus , the DMA engine orders the decrement of the count of the tag counter in the tag counter register corresponding to the tag group of the completed DMA command.
The APU program of the APU can then determine the tag register status of a selected tag group. This is determined through checking the tag status register , such as by checking an additional tag status channel defined within the command bus line from the APU to the DMA engine . The tag status channel has a value of "1" or a "0" for that tag group ID. A value of "1" can mean that there is at least one command in the group outstanding, and a value of "0" can mean that there are presently no commands in the tag group outstanding. For a given tag group ID, the APU only determines whether the DMA engine 130 is finished with a particular tag group, not how many more executions a particular tag group has to go. Based upon this information, the APU can make appropriate processing decisions.
With the flexibility of this approach, software can group DMA commands in order to manage them. For instance all commands for a particular "task" can be grouped into a single tag group. Alternatively, all DMA "get" commands can be placed in a group separate from an output group comprising all DMA "put" commands. In addition, hardware can provide additional command parallelism or ordering rules with respect to groups. The APU software can verify that a single group has completed, all groups have completed, or a specified set of groups have completed operations.
here are several variations on the above and a number of advantages associated with the different variations. In one embodiment, the DMA queue 135 can store up to 32 DMA commands. All DMA commands in the DMA queue could have the same tag group number, they could all have different tag group numbers, or anything in between.

ciao,
Marco
 
This means that the DMA engine can have multiple outstanding transactions. MFA is going to like that, now he can explicitly (vertically)interleave threads at the software level to keep the DMA engine busy. :)

Cheers
Gubbi
 
Guden Oden said:
Brimstone said:
The best Texas Instruments has is 1 Ghz on the .90 nm node. So "CELL" with the "SPU's" will run at 4.6 Ghz? That is a huge performance gap

WTF is this?! You're again adding 1 and 1 and believing 5 is the correct answer. What does TIs 90nm manufacturing have to do with IBMs or Sony-Toshiba's? What does TIs DSPs have to do with Cell?

Answer: big, fat NOTHING!

Because of a TI DSP, you believe DM's ass-numbers are correct? That's nutty, man. Instead of a silly DSP, why not consider the prescott P4, which regularly overclocks to 4-ish GHz, with its double-clocked ALUs running at 8. What do you say about that, huh?


The consumer products that "CELL" goes into will be in the same arena as DSP's. Stream processors are more efficent than DSP's, but I've never seen any evidence for a stream processor clocking faster than a DSP. From what I've read the Imagine processor is projected to hit 1 Ghz by 2007 on a 45 nm process. Modern GPU's contain many ALU units and I've never read of any company claiming such high speeds. Sometimes people even refer to modern GPU's as stream processors. I'm not aware of any GPU breaking the 1 Ghz mark yet, let alone 4.6 Ghz.


The SPU has an ALU that recieve instructions from a VLIW, I don't think the P4's ALU's do this. Also the ALU design in the Athlon is yeilds better results in benchmarks if I remember correctly.


I found this article on a 7 Ghz ALU from Intel.



Researchers from the Santa Clara, Calif.-based chipmaker will present papers at the International Solid State Circuits Conference (ISSCC) in San Francisco that will describe, among other projects, a low-power, high-speed arithmetic logic unit (ALU) that can run both 32-bit and 64-bit code. This, in turn, could allow the company to make Pentium-class chips that could run both types of software. The ALU churns calculations with whole numbers instead of decimals.

...

The ALU runs at more than 7GHz in 32-bit mode and at 4GHz in 64-bit mode. Compared with existing Intel ALUs, the prototype unit increases performance by 20 percent and reduces power consumption by 56 percent, Borkar said. In the Pentium 4 family, the ALU runs at twice the speed of the chip, so the part would fit into a Pentium 4 style chip that would run at 3.5GHz.

The ALU, made on the 90-nanometer process, could be inserted into either Pentium-class chips (so they could run 64-bit software) or Itanium chips (so they could run standard Windows code better), said sources close to the company. The company is looking at a few different ALUs, sources said. Borkar declined to comment on product plans, but said it could go into a next-generation Pentium-style chip.

While the ALU would improve processor performance, the paper on memory describes a way to cure one of the chronic bottlenecks inside computers: getting data in and out of memory.

http://news.com.com/Intel+offers+peek+at+future+chips/2100-1006_3-5159754.html
 
I'm not aware of any GPU breaking the 1 Ghz mark yet, let alone 4.6 Ghz.

GPU doesn't get fine tuned transistor the way high end CPU does. That's why we get new architecture as quickly as we do now. The architecture is still evolving, there is little reason to fine tune. Eventually they will settle, and then the race for clockspeed and better process begins.
 
Brimstone said:
The consumer products that "CELL" goes into will be in the same arena as DSP's.

One: nobody's using DSPs for the job the BE will do in PS3. People DID use DSPs to do 3D graphics calculations once, long ago, in arcade solutions and graphics supercomputers, but that was friggin' ages ago.

Furthermore, Cell isn't a DSP. What's your point bringing DSPs into this? You're just talking nonsense here.

Stream processors are more efficent than DSP's, but I've never seen any evidence for a stream processor clocking faster than a DSP.

You're comparing apples and oranges. There's nothing that says that a "stream processor" (whatever you may want to put in that context) can't be clocked faster than a certain level. There's also no evidence the BE will fill the criterias matched by those chips you've labelled "stream processors" - whichever they may be. More apples-oranges BS.

From what I've read the Imagine processor is projected to hit 1 Ghz by 2007 on a 45 nm process.

And that's relevant to BE in what way exactly?

Reminder for the dull-witted: Imagine is not Cell-based!

Modern GPU's contain many ALU units and I've never read of any company claiming such high speeds.

Modern GPUs aren't Cell-based EITHER. Again, what's your point bringing up all this irrelevant BS?

The SPU has an ALU that recieve instructions from a VLIW, I don't think the P4's ALU's do this.

AFAIK, there's no VLIW stuff at all in Cell. Besides - relevance?

My point was to show there's no problem designing very high switching speed components on a 90nm process. Doesn't even have to be SOI tech either - Prescott doesn't use that AFAIK. The inner workings of the ALU; VLIW or not, doesn't matter, clock speed doesn't hinge on this factor.

Also the ALU design in the Athlon is yeilds better results in benchmarks if I remember correctly.

Again: RELEVANCE?! :rolleyes:

I ask you once more: why would Cell in your opinion use a cache that runs at 4x clock of the ALU fed from the cache? You never answered this.
 
Back
Top