Cell mass production plan for 2nd half of 2005

Paul said:
The website has some type of protection on their pages, you can type "ps3 gauntlet" on google and it's the first result though.

When he said this he was Sony CTO Kenshi Manabe vice president of SCEI semiconductors.
This one?

CVG said:
Thursday 30th January 2003

SONY THROWS DOWN PS3 GAUNTLET

Microsoft to lead with Xbox 2? Not if Sony has anything to do with it...

14:29 Sony Computer Entertainment has gone on the record in Japan to state it has brought forward the planned release of PlayStation 3, which may bring it ahead of Xbox 2 in 2005.

At a conference this week, Sony CTO Kenshi Manabe said: "We intend to launch the successor to PlayStation 2 ahead of our initial schedule, which was drawn up some years ago. This, we believe, will bring us ahead of Microsoft, which is planning a new console towards the end of 2005."

With Microsoft repeatedly insisting it will be first out of the door and Nintendo recently confirming it will release alongside the competition, the fight is firmly on. Who's your money on?
 
panajev, I'm still of the belief that, as the current design stands, is bandwidth limited despite claims of large numbers of registers, if simply due to not believing in 32 piles of 128K SRAM with 256-bit buses each being possible. I can't see how they would actually design that isn't choked here. Cell could sound reasonable, just not as you describe it.
 
The APU is clearly described as having a 256 bits bus from the 128 KB of SRAM to the Resgister file.

Number of APUs will depend on the effective throughput of each AP ( which might be more than 8 ops per cycle or not ) and the clock-frequency of the final design.

You can say that you do not believe in 1 TFLOPS of aggregated performance as you do not believe in ahving as many APUs and FMACs as we do, but in order to sustain your argument of terible bandwidth limitation issues you have also to believe that a single APU is terribly bandwidth limited at 32 GFLOPS.

Do you believe that to be the case ?

Of course no design exactly matches in real world scenarios the theoretical peak numbers ( none that I have heard at least ;) ), but 128 Registers in each APU compared to 32 Registers in each of the EE's VUs would limit the pressure put on the LS and thus limit the slowdown that using the LS would bring ( which might not be even that much considering that APU's LS bandiwdth would be quite high as my "fuzzy math" showed you [and a 256 bits bus together with a decent clock-frequency for the said local bus can help making that a reality] ).

There are lots of different kind of bandwidth limitations and no high performance design is completely void of such a problem.
 
Panajev, I know this has been gone over before but forgive me, my memory is very short sometimes, do you think its concivable that the PS3 CPU alone could have 64 APUs?
( i know 32 is the baseline )

If PS3 were based precisely on the two-chip example in the
Sony Cell patent, there would be 48 APUs for the whole machine
(32 in CPU plus 16 in GPU).


obviously there are many possibilities. the CPU might have only 16 APUs and the GPU only 8 APUs. I cannot wait to learn about the final design, probably next year :)
 
Panajev said:
but 128 Registers in each APU compared to 32 Registers in each of the EE's VUs would limit the pressure put on the LS and thus limit the slowdown that using the LS would bring
I don't know why you should assume "slowdown" from accessing LS, the whole point of having it is to remove memory latency dependancies from APUs, much the same way that is done in VUs.
 
Panajev, the more we talk about the "incredible" Cell-puter the less plausible it seems. Problems are solved by unbelieve solutions and the performance of this thing is based on holistic things like going at high clock speeds and having lots of APUs and eDRAM. When it comes down it, I can only ask "how does this even fit on a chip?" The bandwidth problem is just another thing I have trouble with and it won't be the last. I don't think it could reach 1TFLOP anymore, probably a small-ish fraction of that. I suspect that this is some sort of backdoor method of producing a GPU and only does a lot of vector and matrix ops. Nothing world-changing here and no revolution in computing, probably not even a threat to the desktop. It will be a good, possibly great design for a console though, and it will stay a good or great design for a console. Just another person not so optimistic here.
 
nonamer said:
Panajev, the more we talk about the "incredible" Cell-puter the less plausible it seems.Problems are solved by unbelieve solutions and the performance of this thing is based on holistic things like going at high clock speeds and having lots of APUs and eDRAM. When it comes down it, I can only ask "how does this even fit on a chip?" The bandwidth problem is just another thing I have trouble with and it won't be the last.

Keep in mind you're still not grasping anything Faf or Panajev's been trying to explain to you. So, this isn't exactly something that suprises me.

I don't think it could reach 1TFLOP anymore, probably a small-ish fraction of that. I suspect that this is some sort of backdoor method of producing a GPU and only does a lot of vector and matrix ops. Nothing world-changing here and no revolution in computing, probably not even a threat to the desktop.

You are very confused. To start, no this isn't a traditional superscalar with a single Vec processor thrown in times 4 per die - an ideology that you keep applying to this to with major problems. Your entire argument on bandwith is, truthfully, a joke.

And saying a "backdoor method" is like they're sneaking this in... this is obvious to anyone knowledgeable that's been following this topic for the last 3 years. The difference between a GPU and CPU is quickly ending. What we're seeing now is the creation of a nearly complete GPU instruction set in DirectX which will put GPUs on a similar footing as CPUs with respect to that aspect. There will be key differences though in the architectures, making a GPU distinctive from a CPU.

Cell, as seen in the PS3, is more on the GPU side of things due to it's concurrency. It's not a 4 cored processor by-inlarge (eg. Intel's 4 cored Itanium), it's a 32-way vector processor. The bandwith issues are a joke because if you look at modern GPUs you'll see they're "suffering" from the same illnesses that Panajev and Faf spoke of concerning Registers. There is no SRAM on them in any great extent, there is mediocre bandwith devoted to the non-dumb functions of the GPUs. Yet, they don't suffer from your imaginary bandwith wall. If anything, we've seen bechmarks where it's bandwith irrelevent.

Which makes sence when you look at what they're processing (eg. Shaders) and what would be the limiting factor in a world of just shaders (eg. Logic).

Or look at the PS2's VUs which are treated in a similar vein. Again, Faf has explained why your wrong on this point. I think you're just very confused and percieving this wrong, I foget who it was, but someone on here said that the Cell architecture made a whole lot more sence once they started programming for PS2 - which would make sence.

And this whole "threat to the desktop" - WTF? This is an IC destined for the consumer electronics marketplace, perhaps some IBM servers. You're creating hype, hype that nobody serious is talking about... and then bashing because it doesn't live upto your fallicious expectations.

It will be a good, possibly great design for a console though, and it will stay a good or great design for a console. Just another person not so optimistic here.

I think your not optimistic because you just don't get it. The whole architecture is revolutionary for the way it passes, processes and manipulates information. But, underlying this is an architecture, made out of silicon like any other. It will be revolutionary for the way it's produced (at that time) and it'll be revolutionary in the way it's used. Everything else is quite ordinary.
 
Sorry Vince, but it's Panajev's explanation that teters off into no-man's land with the concept of a 8000-bit wide cache bus. I can see how you could feed one or a few APUs, but not 32 of them on a single chip, regardless of the number of registers (Panajev, I can see an APU not being bandwidth limited, but trying fit 32 of them a single chip and they tell me how you'll put the whole thing together). And Vince tell that I'm overhyping the PS3? Vince, you're its biggest hyper, and even in that very post you're saying that it's going to be a revolutionary product. I simply disagree. It man have seemed liked that at one point for me, but not now. I simply see it as what say it should be; A video game console.
 
nonamer said:
Sorry Vince, but it's Panajev's explanation that teters off into no-man's land with the concept of a 8000-bit wide cache bus.

Where does he mention a 8K bit wide cache bus? I can see the aggregate of all the busses being like a "virtual" one of similar size, but that's to be expected at such a low level. Which, is kinda where that whole concurrency of design thing I preach comes in.

I can see how you could feed one or a few APUs, but not 32 of them on a single chip, regardless of the number of registers

First of all Cell is amazingly well thought out IMHO. It's never really feeding 32APUs, which is kinda what's well thought out. For the large time percentage of processing, it'll be one APU feeding itself. For a smaller percentage it'll be an APU feeding off the SRAM, and for an even smaller percentage it'll be the the 8 APUs feeding off the eDRAM under a PE. It'll never be 32.

And why this is neat is that because there's such concurrency and interdependance possible, you get massive bandwith advantages to having such a level of parallelization, ergo that 8TB/sec bandwith at a low, operational, level.

EDIT: Just to further explain, right now I can't see anything in the design that by "correcting" would yeild a massive increase proportionally, thus it's "well thought out." But, I wouldn't be surprised to find them once we learn more.

And Vince tell that I'm overhyping the PS3? Vince, you're its biggest hyper, and even in that very post you're saying that it's going to be a revolutionary product. I simply disagree. It man have seemed liked that at one point for me, but not now. I simply see it as what say it should be; A video game console.

Yes, you're overhyping it by say it'll be things it won't be. I may be the biggest hyper, but I stick to things that it can do by design, like revolutionize the sharing of data over broadband networks in the consumers household electronics or palm. You're the one talking about revolutionalizing PC's and other BS.

For what is it. Well what did you think it was before, the second coming of Christ? Give me a break, it was allways an architecture designed for consumer electronics.
 
Look at the R3xx VPU series, its got 4 vertex shaders / geometry engines
plus 8 pixel pipelines. that's 12 main functional units, plus other sub-units for other things. R3xx only has 107~120M transistors. The PS3's Cell-based CPU will have at least 500M transistors. probably 200~300M will be for logic. I don't see why it couldnt have 32 APUs plus the other stuff.
 
Megadrive1988, transistor count is irrelevent as a metric. Just worry about Area. The transistor count will probably be hyperinflated anyways due to the embedded DRAM.
 
Fafalada said:
Panajev said:
but 128 Registers in each APU compared to 32 Registers in each of the EE's VUs would limit the pressure put on the LS and thus limit the slowdown that using the LS would bring
I don't know why you should assume "slowdown" from accessing LS, the whole point of having it is to remove memory latency dependancies from APUs, much the same way that is done in VUs.

Well, you can assume it will be practically free, but it is not exactly free.

LS can only provide 256 bits per cycle.

This basically means 2 registers.

A Vector MADD requires 3 input registers and unless we already have one register of the three we need already loaded with data and we will need 1 more cycle to load the third register.

This can be hidden a bit in the Vector MADD ( it taes 4 cycles [throughput] ).

However, any slowdown that might cause would be less of a problem thanks to the registers being in great number.

What if VU1 and VU0 only had 8-16 FP registers ? Would performance change ? I'd say it would ( which is why I was stressing the importance of having 128 registers ).
 
Sorry Vince, but it's Panajev's explanation that teters off into no-man's land with the concept of a 8000-bit wide cache bus.

It isn't 8000 bit. When he said local memory, its local to the APU. Its not a pool of memory with 8000 bit bus, where every APU has accessed to.

Chryz,

Thanks buddy, now I remember that quote, its quite old, but I guess that's the latest on their PS3 launched plan for now.
 
It isn't 8000 bit. When he said local memory, its local to the APU. Its not a pool of memory with 8000 bit bus, where every APU has accessed to.

V3, respect the line... there is Fafalada, I and Vince got there first :p

You did start the thread though ;)
 
Bloomberg Japan (translation by [url said:
www.excite.co.jp[/url])]Toshiba and U.S. Rambus: For high-speed data transmission technical development-PS2 succeeding machines October 14 (Bloomberg): The U.S. Rambus company of Toshiba and a semiconductor development company announced that it developed 14 days and the technology of transmitting data among two or more MPU (micro processing unit) 6 times as at high speed as the former. It opens to the public at the show "a microprocessor forum" of MPU by which San Jose (California), U.S., holding is carried out for three days from the same day. In collaboration with Sony Computer Entertainment (S C E) and U.S. IBM, Toshiba is developing [ be / it ] a new model MPU (micro processing unit) "CELL (cell)" in order to use for the succeeding machine of the home video game machine "PlayStation 2 (PS2)" of S C E etc. For this reason, three companies, Toshiba, Sony, and S C E, conclude a Rambus company and a licensing agreement in order to pull out the performance of a cell in January, this year to the maximum extent. It was decided that the technology of a Rambus company was used for the data transmission by MPU, and the data transmission between MPU and DRAM (it is a write-in read-out memory at any time [ which needs memory maintenance operation ]), respectively. Toshiba is going to produce a cell commercially in the 2005 fiscal year using the processing line width of 65nm (nano is 1 of 1 billion portions), and detailed process technology. At this show, data transmission is demonstrated instead of the cell under development among two or more system LSIs for an examination which used line width the technology of 90n. A closing price is 504 yen, yen [ end ratio / 13 yen ] (2.7%) higher, and Sony is 4050 yen of this 60 circle (1.5%) quantity the week before in the stock price morning of Toshiba.

Last Updated: October 13, 2003 22:11 EDT
Original article.

EDIT/ADDED:

http://www.rambus.com/news/pressrelease.cfm?id=111 said:
Toshiba and Rambus to Demonstrate World's Fastest Parallel Interface at Microprocessor Forum

Rambus's 6.4GHz "Redwood" parallel logic interface optimized for processor, chipset and network chip connections


MICROPROCESSOR FORUM, SAN JOSE,CA - October 13, 2003 - Rambus Inc. (Nasdaq:RMBS), a leading developer of chip-to-chip interface products and services, and Toshiba Corporation announced today that they will jointly demonstrate Toshiba's ASIC evaluation chip that incorporates Rambus's parallel logic interface codenamed Redwood.

This new evaluation chip is implemented on Toshiba's 90-nanometer ASIC process and is capable of running at speeds up to 6.4GHz, which is six times faster than processor busses available today. The chip is being used as a test vehicle for future customer platforms. The Redwood interface has been designed for high volume, cost-sensitive applications.

"We recognize Rambus as a leader in providing high-speed interface technology," said Yutaka Murao, general manager of Microprocessor Division, Semiconductor Company at Toshiba Corporation. "Since we signed the license agreement with Rambus in January 2003, our engineers have been working closely with Rambus to integrate the high-speed chip-to-chip Redwood interface into our evaluation chip. This early evaluation chip is an important step enabling our customers to maintain lower latency and lower power consumption than current solutions, while at the same time providing excellent cost efficiency. The flexible and scalable architecture of Redwood allows us to provide our customers with solutions optimized for their specific needs. Toshiba will apply the result of this joint effort, based on the most advanced process technologies including 65-nanometer, to leading system LSIs."

The Redwood parallel bus interface family addresses intra-board applications including processor, chipset and network chip connections. It is optimized for low latency and low power parallel bus applications, and enables high-pin bandwidth to reduce overall package, board, and system costs. Additionally, Redwood can be backwards compatible with existing LVDS-based standards such as HyperTransport, SPI-4 and RapidIO, allowing for easy integration into next-generation products. In order to achieve backwards compatibility, Redwood offers customers a range of frequency and voltage support.

"Toshiba has developed state-of-the-art technologies and processes that continue to make them a valuable partner for us in bringing next-generation, high-performance products to the marketplace," said Laura Stark, vice president of the Memory Interface Division at Rambus. "Toshiba's advanced ASIC technology, coupled with our Redwood interface, will help enable data rates of up to 6.4GHz to enable the fastest processor bus speeds available today."

Elements integrated into the Redwood technology include a 400MHz to 6.4GHz data rate range, low-voltage differential signaling, backwards compatibility with existing standards, FlexPhase adaptive timing circuit technology, and dynamic current and termination capabilities. Combined, all of these technologies allow Redwood to achieve up to 6.4GHz speeds for low-cost systems.

This new evaluation chip will be demonstrated at the Microprocessor Forum at the Fairmont Hotel in San Jose, California, Ocotober 13 - 16, 2003. Additional information on Redwood can be found at www.rambus.com/products/redwood/

About Rambus Inc.

Rambus is one of the world's leading providers of chip-to-chip interface products and services. The company's breakthrough technology and engineering expertise have helped leading chip and system companies to solve their challenging I/O problems and bring industry-leading products to market. Rambus's interface solutions can be found in numerous computing, consumer electronic and networking products. Additional information is available at www.rambus.com.

About Toshiba

Toshiba Corporation is a leader in the development and manufacture of electronic devices and components, information and communication systems, consumer products and power systems. The company's ability to integrate wide-ranging capabilities, from hardware to software and innovative services, assure its position as an innovator in diverse fields and many businesses. In semiconductors, Toshiba continues to promote its leadership in the fast growing system-on-chip market and to build on its world-class position in NAND flash memories, analog devices and discrete devices. Toshiba has approximately 166,000 employees worldwide and annual sales of over US$47 billion. Visit Toshiba's website at www.toshiba.co.jp/index.htm.
 
Panajev said:
Well, you can assume it will be practically free, but it is not exactly free. LS can only provide 256 bits per cycle. This basically means 2 registers
What I meant is "free" from memory dependancy - load/store instructions have the same latency/throughput as normal register operations (even move reg->reg).
Obviously having more registers helps when you run out of load/store instruction slots. :p (and when optimizing VU loops, that is quite normal).
 
"ChryZ"
Bloomberg Japan (translation by [url said:
www.excite.co.jp[/url])]Toshiba and U.S. Rambus: For high-speed data transmission technical development-PS2 succeeding machines October 14 (Bloomberg): The U.S. Rambus company of Toshiba and a semiconductor development company announced that it developed 14 days and the technology of transmitting data among two or more MPU (micro processing unit) 6 times as at high speed as the former. It opens to the public at the show "a microprocessor forum" of MPU by which San Jose (California), U.S., holding is carried out for three days from the same day. In collaboration with Sony Computer Entertainment (S C E) and U.S. IBM, Toshiba is developing [ be / it ] a new model MPU (micro processing unit) "CELL (cell)" in order to use for the succeeding machine of the home video game machine "PlayStation 2 (PS2)" of S C E etc. For this reason, three companies, Toshiba, Sony, and S C E, conclude a Rambus company and a licensing agreement in order to pull out the performance of a cell in January, this year to the maximum extent. It was decided that the technology of a Rambus company was used for the data transmission by MPU, and the data transmission between MPU and DRAM (it is a write-in read-out memory at any time [ which needs memory maintenance operation ]), respectively. Toshiba is going to produce a cell commercially in the 2005 fiscal year using the processing line width of 65nm (nano is 1 of 1 billion portions), and detailed process technology. At this show, data transmission is demonstrated instead of the cell under development among two or more system LSIs for an examination which used line width the technology of 90n. A closing price is 504 yen, yen [ end ratio / 13 yen ] (2.7%) higher, and Sony is 4050 yen of this 60 circle (1.5%) quantity the week before in the stock price morning of Toshiba.

Last Updated: October 13, 2003 22:11 EDT
Original article.

EDIT/ADDED:

http://www.rambus.com/news/pressrelease.cfm?id=111 said:
Toshiba and Rambus to Demonstrate World's Fastest Parallel Interface at Microprocessor Forum

Rambus's 6.4GHz "Redwood" parallel logic interface optimized for processor, chipset and network chip connections


MICROPROCESSOR FORUM, SAN JOSE,CA - October 13, 2003 - Rambus Inc. (Nasdaq:RMBS), a leading developer of chip-to-chip interface products and services, and Toshiba Corporation announced today that they will jointly demonstrate Toshiba's ASIC evaluation chip that incorporates Rambus's parallel logic interface codenamed Redwood.

This new evaluation chip is implemented on Toshiba's 90-nanometer ASIC process and is capable of running at speeds up to 6.4GHz, which is six times faster than processor busses available today. The chip is being used as a test vehicle for future customer platforms. The Redwood interface has been designed for high volume, cost-sensitive applications.

"We recognize Rambus as a leader in providing high-speed interface technology," said Yutaka Murao, general manager of Microprocessor Division, Semiconductor Company at Toshiba Corporation. "Since we signed the license agreement with Rambus in January 2003, our engineers have been working closely with Rambus to integrate the high-speed chip-to-chip Redwood interface into our evaluation chip. This early evaluation chip is an important step enabling our customers to maintain lower latency and lower power consumption than current solutions, while at the same time providing excellent cost efficiency. The flexible and scalable architecture of Redwood allows us to provide our customers with solutions optimized for their specific needs. Toshiba will apply the result of this joint effort, based on the most advanced process technologies including 65-nanometer, to leading system LSIs."

The Redwood parallel bus interface family addresses intra-board applications including processor, chipset and network chip connections. It is optimized for low latency and low power parallel bus applications, and enables high-pin bandwidth to reduce overall package, board, and system costs. Additionally, Redwood can be backwards compatible with existing LVDS-based standards such as HyperTransport, SPI-4 and RapidIO, allowing for easy integration into next-generation products. In order to achieve backwards compatibility, Redwood offers customers a range of frequency and voltage support.

"Toshiba has developed state-of-the-art technologies and processes that continue to make them a valuable partner for us in bringing next-generation, high-performance products to the marketplace," said Laura Stark, vice president of the Memory Interface Division at Rambus. "Toshiba's advanced ASIC technology, coupled with our Redwood interface, will help enable data rates of up to 6.4GHz to enable the fastest processor bus speeds available today."

Elements integrated into the Redwood technology include a 400MHz to 6.4GHz data rate range, low-voltage differential signaling, backwards compatibility with existing standards, FlexPhase adaptive timing circuit technology, and dynamic current and termination capabilities. Combined, all of these technologies allow Redwood to achieve up to 6.4GHz speeds for low-cost systems.

This new evaluation chip will be demonstrated at the Microprocessor Forum at the Fairmont Hotel in San Jose, California, Ocotober 13 - 16, 2003. Additional information on Redwood can be found at www.rambus.com/products/redwood/

About Rambus Inc.

Rambus is one of the world's leading providers of chip-to-chip interface products and services. The company's breakthrough technology and engineering expertise have helped leading chip and system companies to solve their challenging I/O problems and bring industry-leading products to market. Rambus's interface solutions can be found in numerous computing, consumer electronic and networking products. Additional information is available at www.rambus.com.

About Toshiba

Toshiba Corporation is a leader in the development and manufacture of electronic devices and components, information and communication systems, consumer products and power systems. The company's ability to integrate wide-ranging capabilities, from hardware to software and innovative services, assure its position as an innovator in diverse fields and many businesses. In semiconductors, Toshiba continues to promote its leadership in the fast growing system-on-chip market and to build on its world-class position in NAND flash memories, analog devices and discrete devices. Toshiba has approximately 166,000 employees worldwide and annual sales of over US$47 billion. Visit Toshiba's website at www.toshiba.co.jp/index.htm.

Thanks for the news :)
 
Back
Top