N5 to be PowerPC based

Status
Not open for further replies.
There are SEVERAL Blue Gene projects. P, C and L,
and there are preliminary stages to each project.

One of them will use a chip known as Cyclops, which
has 32 1 GFlop/s cores per chip as opposed to the just
announced system which hits 2 GFlop/s per chip,
assuming that they are reporting peak for DP.

They have straight FPUs and do a peak of one MADD
per cycle IIRC. So at 500 MHz they'll hit 32 GFlop/s
for Cyclops, substitute a SIMD single precision
FPU and you'd hit 128 GFlop/s peak at 500 MHz.

FWIW I think there's a fair change of Sony getting
into the 128-512 GFlop range. A Teraflop looks
a bit out there, but if you read the patents it seems
clear that Sony intend to evolve the Cell architecture
and keep it compatible so maybe there will be a
bump to 1 TFlop/s late in the PS3s lifetime. :)

The PowerPC-ness of Cell/Xbox2/BlueGene and
possibly N5 is an obvious conclusion to draw as it's
IBM's 'house' architecture.

Is Cell related to BlueGene? Possibly, they share
similar goals in terms of architecture and performance
so you could easily imagine them becoming related,
which also applies to Xbox2 and N5.
 
hehe that's what i've been saying all along... they're gonna get so much cash from the whole Console market, since they are involved in some way or another in the making of each of the next gen consoles..... whoever wins, IBM will be LOADED. quite clever if you think about it...


IBM PowerPC in Xbox2
IBM PowerPC (?) in N5
IBM (STI) Cell in PS3

yeah
 
Remember Kutaragi Ken's presentation? Each rack contains 64 chips, and you can fit 8 racks per cabinet, so the total is 512 chips. But since each chip carries two processors, the total number of processors is 1024.

:LOL:

Yea and we all know how Celluar computing and SCEI/Toshiba's Cell are the same thing.

Could it be that Cell really = Blue Gene derived!

Derived in idea, not computing power. Take a look what SCEI's Cell looks like in their many patents. Don't even respond with an argument, because you'll be arguing with someone who has spent years studying the architecture 'Cell'. Which means; you lose.


BTW, you can forget about that 4 Ghz imaginary monster called CELL, BlueGene/L clocks at 733 Mhz.

Sony and Toshiba are looking to have the last laugh. And I imagine Come the 2004 time frame you will be about ready to pull a Cobain, the note and all.
 
Paul said:
Derived in idea, not computing power. Take a look what SCEI's Cell looks like in their many patents. Don't even respond with an argument, because you'll be arguing with someone who has spent years studying the architecture 'Cell'. Which means; you lose.

Aint gonna too. Not in me to try and pass off as some true blu tech dude. ;)

ANYWAY peeps, i read that this BlueGene/L can store as many cpus will still being aircooled is due to the really low clockspeeds of cpus...
4Ghz....hmmmm hmmm
 
Robert McMillan, IDG News Service

IBM has built a 512-node prototype of its Blue Gene L supercomputer that has been ranked as the 73rd most powerful computer in the world. The machine, which is capable of a peak performance of 2 trillion floating-point operations per second (teraflops), is about the size of a 30-inch TV.




The Blue Gene L supercomputer, which is being built by IBM for Lawrence Livermore National Labs, will be the first major system to be built under IBM's Blue Gene research project, which was launched in 1999. The project's goal is ultimately to build a computer capable of a petaflop, or one thousand trillion operations per second, about 25 times as fast as the most powerful computer today, the 41-teraflop Earth Simulator supercomputer.


The key to Blue Gene's ability to extract such performance out of such a small amount of real estate is the embedded PowerPC processor that IBM researchers have designed for the machine. Each Blue Gene chip contains dual floating-point processors, 4MB of L3 memory, and five network controllers.


System on a Chip

"It's really this system-on-a-chip technology," said Bill Pulleyblank, the director of exploratory server systems for IBM research.


The system-on-a-chip approach means that Blue Gene's nodes do not contain the kind of features typically found in commodity systems--disk drives or sound cards or microphone jacks--and require far less space and power than other computers. "You don't have a lot of extraneous stuff that you're trying to cool," said Don Dossa, a computational physicist with Lawrence Livermore who is working on the project. "We have processor, memory, and communications."


The 700-MHz processors have a peak power consumption on the order of 10 to 15 watts per node, said Dossa.

Blue Gene's heat management is further enhanced by a unique design that will give the supercomputer a tilted look, like a row of dominos simultaneously tilted to one side. "The real secret is by using these low-power processors and by doing some careful engineering on it, we're able to air-cool the machine," said Pulleyblank. Because of these two elements, Blue Gene requires about one-tenth the cooling of a typical supercomputer, he explained.

When Blue Gene L finally ships to Lawrence Livermore's Terascale Simulation Facility building a year from now, the 65,000-node machine will take up 2,500 square feet, less than one-tenth the area of the Earth Simulator, according to Dossa.



Software Side

Now that Blue Gene's hardware is working in prototype at IBM's Thomas J. Watson Research Center in Yorktown Heights, New York, the research team is turning its focus to developing the software tools that will allow applications to run across 65,000 processors at the same time.


"We have a number of challenges facing us in terms of usability," said Dossa. "A lot of the software people know that their algorithms will not scale that way, so they have to rethink their algorithms," he said.


Lawrence Livermore scientists plan to use the supercomputer to perform extremely accurate calculations on the dynamics of fluids and molecules at the atomic level, as well as dislocation dynamics, or the study of how certain materials can actually be strengthened through defects.


Some of Blue Gene L's software will eventually be used as part of a larger application designed to simulate nuclear explosions under the U.S. Government's Stockpile Stewardship program, said Mark Seager, Lawrence Livermore's principle investigator for ASCI (Accelerated Strategic Computing Initiative) platforms.


"Our strategy for Blue Gene, actually, is to focus on some of the key science aspects as separate entities, and then later factor that into the weapons code for the next-generation machine," he said.


Seager's plans also depend on IBM's ability to build a working system that is 128 times larger than its current prototype.


Seager believes that the potential cost savings are worth the risk. Blue Gene L, which will deliver between 180 teraflops and 360 teraflops, will cost between $50 million and $100 million to complete, or about $200,000 per teraflop, Seager said. Lawrence Livermore's ASCI Purple supercomputer, by comparison, will cost between $1 million and $2 million per teraflop, he said.





While Seager characterizes Blue Gene L as a "high-risk" attempt to build an affordable supercomputer, it is one that could help the lab's work on the Stockpile Stewardship program. "At this point, we're confident that it will have an impact on the program," he said.
 
4Ghz....hmmmm hmmm

I don't anticipate 4Ghz for a Cellular PS3 CPU, but then again; who anticipated PSP doing 33 million polygons raw?

The unexpected can happen, just look at PSP's specs and MS using PowerPC.
 
read above.
multicore at 4ghz seems like a toaster...and it be like the BG/Cell multicore system works with many cheap/lowclocked cpus..
 
While i still stand by my end results matter mantra, Juz asking...Did KK ever said 1TFOPS be for their next gaming machine? cant recall all their hypey talk, but i only be thinking that they juz be said they be aiming for 1000X PS2... :?: :?: :?:
 
July 2003 EGM Okamoto is quoted as saying it. Sorry, I don't have a scan.

That being said I only expect to see 256GFLOPS which is still a staggering ammount, but this doesn't mean I have doomed 1TFLOPS from not being possible.
 
That being said I only expect to see 256GFLOPS which is still a staggering ammount, but this doesn't mean I have doomed 1TFLOPS from not being possible.

if DM is to be beleived (and he has a rather interesting take on it), it would not likely to be even that, nor will ever reach anything near good performence in the short term.
 
MfA said:
If mr. Irving Wladawsky-Berger was quoted correctly, and knew what he was talking about, the Cell from the patent is history.

I think your very mistaken. The Broadband Engine is a Toshiba/Sony offshoot of this program, ergo the terms of the Rambus licenesing deal with SCE/Toshiba.

IMHO, there are some major misconceptions being perpetuated here by people on both sides of the argument; to the point where it's not even worth addressing them. Time will show.
 
512 nodes, 700 MHz per node... 2 TFLOP

This would mean 4 GFLOPS per node and at 700 MHz this means about 5.76 FP ops per node: let's say 6 as we cannot have fractions ( this would be 2.1 TFLOPS, but 2 TFLOPS might be a more realistic peak even theretically speaking ).

4 FP ops from the Worker core ( dual FPU remember the quoted text you posted chap ? ) and 2 FP ops from the director core which would have an FPU ( all in each node, following BlueGene/L model ).

I think 2-2.5 GHz for 65 nm CELL is possible ( the target would be 500-640 GFLOPS ).

Each APU can do 8 FP ops/cycle and this would mean being 1.34x faster per cycle than each of BlueGene/L's nodes.

Our peak is around 1/4th ( 500 GFLOPS, 2 GHz ) of the peak this BlueGene/L based super-computer is talking about: it would be more if we count 2.1 TFLOPS for that BlueGene super-computer ( we counted 6 FP ops/cycle for it after-all and at 700 MHz and 512 nodes 2.1 TFLOPS is what comes out of the calculator ).

So we can modify 1/4 to 1/(4.2).

This means that so far we need ~5.628x less chips ( each chip = 1 APU ).

Our clock-frequency is also ~2.857x higher ( 2 GHz for the APU vs 700 MHz for the BlueGene/L nodes ).

This means ~16.08x less chips needed in total.

This brings us around the need for ~31.84 chips or 32 chips to use integer values.

Each chip in our calculation was 1 APU.

So, 32 APUs...

This means 4 PEs and 8 APUs per PE which even Deadmeat is possible ( yes, even he finally agreed to it ).
 
Size considerations:

We have 4 MB of L3 memory ( e-DRAM ? ) in each node and we have 512 nodes.

This means 2 GB of L3 memory in the system counting all the 512 nodes.

To keep things fair the Broadband Enginge chip ( Fig. 6 in Suzuoki's patent ) would have 4 MB of SRAM based LS and 16-32 ( the patent hinted up to 64 MB ) of e-DRAM.

That would make 20-36 MB of memory on the whole Broadband Engine.

This means at least over 56x less the space used for memory and we can also reduce the network interfaces.

I also do not think they are using 65 nm technology for this 2 TFLOPS BlueGene/L system.
 
...

I think 2-2.5 GHz for 65 nm CELL is possible
1. PPC440 cannot be clocked that high.(How many times do I must repeat?)
2. High-clockspeed goes against the very design philosophy behind CELL; use of lots of low power processors to achieve performance.

Our clock-frequency is also ~2.857x higher ( 2 GHz for the APU vs 700 MHz for the BlueGene/L nodes ).
Keep dreaming....

This means 4 PEs and 8 APUs per PE which even Deadmeat is possible ( yes, even he finally agreed to it ).
Do not twist my word. I stated that the maximum possible number of PE per chip was 2 for superpipelined PEs and 4 for short-pipelined PEs. The peak FLOPS possible stays constant at around 256 GFLOPS in either case, only the power consumption level changes with the short-piped PE version generating far less heat and more suitable for consumer applications.

256 GFLOPS is the theoretical peak possible for EE3 assuming Kutaragi is willing to lose as much money on the first batch of EE3s as it did on EE1. You can wake up from your teraflop dreams now.
 
Wow, a small snowstorm has brewed while I've been slaving away with my research group...

I don't see how this is all that new and insightful... so Sony and Nintendo will use Blue Gene/L derivatives....so... it will use PPC cores. Um. Didn't we know that already?

The reason the flop counts are low is because they don't have the vector units (I think, I've been searching the IBM site, but if someone finds out about this, post a link please.) This also does not take into account process technology that will be available later on.

What IS encouraging is that the basic architecture is now in a actual machine, which means software development, which is critical for multiprocessing, can proceed. Obviously, the PS3 implementation will be different, but I think the most pressing challenges can first be tackled on this, making writing PS3 software that much faster and easier.

DGMA, could you please stop with the "I've been right all along, you guys wouldn't listen to me". First of all, your posts have been everywhere, and because you can point out 3 posts out of the hundreds of CELL posts you have doesn't particularly impress me. Secondly, just last week (I think), you posted that a CELL would have 1 GHz, simple VUs, and 250 GFlops. Now you're saying that it would require 500 processors to hit 1TFlops. Being "right all along" implies consistency, and you're not consistent.

(I also noticed that you made your Deadmeat's past CELL = BlueGene/L posts revisited about 10 hours after the news first hit (according to Google News), so that doesn't particularly impress me either.)

But let me be the first to say that of all the people here, IMHO, you (DGMA) have made the clearest connection between CELL and Blue Gene/L. While you've made a many wrong connections, and however inconsistent even this one was, it turned out to be true. My hat's off to you for that.
 
Re: ...

DeadmeatGA said:
I think 2-2.5 GHz for 65 nm CELL is possible
1. PPC440 cannot be clocked that high.(How many times do I must repeat?)
2. High-clockspeed goes against the very design philosophy behind CELL; use of lots of low power processors to achieve performance.

Our clock-frequency is also ~2.857x higher ( 2 GHz for the APU vs 700 MHz for the BlueGene/L nodes ).
Keep dreaming....

This means 4 PEs and 8 APUs per PE which even Deadmeat is possible ( yes, even he finally agreed to it ).
Do not twist my word. I stated that the maximum possible number of PE per chip was 2 for superpipelined PEs and 4 for short-pipelined PEs. The peak FLOPS possible stays constant at around 256 GFLOPS in either case, only the power consumption level changes with the short-piped PE version generating far less heat and more suitable for consumer applications.

256 GFLOPS is the theoretical peak possible for EE3 assuming Kutaragi is willing to lose as much money on the first batch of EE3s as it did on EE1. You can wake up from your teraflop dreams now.

The chip should be able to clock at 2 GHz: you assume 10-15 Watts ( as you did ) for a 1 GHz clock-speed... come on... are you telling me that hey cannot double the clock speed and keep Power Dissipation under 45-50 Watts ?

Do you assume that they will not put some heatsinks and fans ? I mean your initial assumption is between 10-15 Watts and that is close enough to the 7 Watts barrier at which point you can avoid fans practically.

You fell down from your soapbox you built to tell the world that each PU HAD to be a bloated G4 core so before knocking other's people idea down "because you said so", I would suggest you to wait.

Also you cak keep the APUs 2x faster than the PUs clock-wise: nowhere in the patent it specifies that they have to have the same clock-speed.

You assume the PPC core they use for PU ( it does not have to be the same exact one they used for BlueGene/L even though a simple PPC core should not be so complex in term of structure and of instructions: allowing for a faster clock-speed ) can clock at 1 GHz, but not at 2 GHz.

Why ?

This BlueGene/L uses 700 MHz PPC cores and it is not even using 65 nm manufacturing technology ( maybe it is using SOI, but I am not sure ): the major problem is that they need 512 nodes and each node has basically three FPUs ( going by their FLOPS rating ) and 4 MB of L3 Cache...

If each chip ran at like 1.5 GHz that would mean increase in power consumption * 512.

STI CELL uses only 32 APUs, 4 PUs and 4 DMACs basically versus 512 Director PowerPC cores and 512 Worker PowerPC cores: 38 processors vs 1,024.

I do no understand why it would need t have such a long pipeline to allow the APUs to run at 2 GHz ( 65 nm technology and SOI ) as you are saying: I also do not understand why modifications in the pipeline would be so large to cut in half the number of PUs and APUs they can fit in the same chip.
 
The chip should be able to clock at 2 GHz: you assume 10-15 Watts ( as you did ) for a 1 GHz clock-speed... come on... are you telling me that hey cannot double the clock speed and keep Power Dissipation under 45-50 Watts ?

Its not that simple, though.
 
V3 said:
The chip should be able to clock at 2 GHz: you assume 10-15 Watts ( as you did ) for a 1 GHz clock-speed... come on... are you telling me that hey cannot double the clock speed and keep Power Dissipation under 45-50 Watts ?

Its not that simple, though.

I know that, V3, but I do think that there should be no extreme power consumption issues or pipeline lenght issues blocking the way from 2 GHz: we would need a longer pipeline maybe ( few more stages ), but not to the extent that we could only pack half the APUs and half the PUs.
 
...

nondescript

DGMA, could you please stop with the "I've been right all along, you guys wouldn't listen to me".
I can't, because I've been right all along...

Secondly, just last week (I think), you posted that a CELL would have 1 GHz, simple VUs, and 250 GFlops.
And that's exactly what EE3 will have. 256 GFLOPS or less depending on how much money Kutaragi is willing to lose(which I cannot calculate nor estimate)

Now you're saying that it would require 500 processors to hit 1TFlops.
No I have not.

From my older post

Like BlueGene/L node which inspired CELL core, each CELL core is built around single PPC core serving as the I/O engine, while 8 VUs handles the computational tasks dispatched from the Linux kernel. It is the separation of kernel and application process that sums up the CELLULAR COMPUTING design philosophy.

Go back to older post and you will see the recurring theme of "CELL is a modified BlueGene/L wth compute engine replaced by vector units to boost floating point prformance".

To Panajev

The chip should be able to clock at 2 GHz
Sure, if SCEI went superpipelining(and jacked up the transistor count). Then SCEI would put fewer PEs into a die.

You fell down from your soapbox you built to tell the world that each PU HAD to be a bloated G4 core
Because I too were assuming highclock speed(Damn that Suzuoki patent) of around 2~3 Ghz, which requires the use of a powerful core to serve 8 APUs.

Also you cak keep the APUs 2x faster than the PUs clock-wise: nowhere in the patent it specifies that they have to have the same clock-speed.
Still keeping the teraflop dream alive, I see.

This BlueGene/L uses 700 MHz PPC cores and it is not even using 65 nm manufacturing technology ( maybe it is using SOI, but I am not sure )
Clock speed is a design feature and not a process feature, how many times must I repeat it before it goes through your head???

I do no understand why it would need t have such a long pipeline to allow the APUs to run at 2 GHz ( 65 nm technology and SOI )
Because short-pipeline must do more work per stage and thus takes longer to complete its task. Clock speed is a design feature and not a process feature.

as you are saying: I also do not understand why modifications in the pipeline would be so large to cut in half the number of PUs and APUs they can fit in the same chip.
Recall the transistor count jump from P3 to P4; Intel tripled the transistor count to stretch the pipeline from P3 13-stages to P 25-stages.

assume 10-15 Watts ( as you did ) for a 1 GHz clock-speed... come on... are you telling me that hey cannot double the clock speed and keep Power Dissipation under 45-50 Watts ?
Look at Efficeon; it burns 7 watts at 1.1 Ghz, but burns 25 watts at 2 Ghz. 40~50 watts will probably take it to 2.3 Ghz

You fell down from your soapbox you built to tell the world that each PU HAD to be a bloated G4 core
Which I corrected after Kutaragi showed how he intended to reach 2 TFLOPS; by putting 64 chips on a rack.

You assume the PPC core they use for PU ( it does not have to be the same exact one they used for BlueGene/L even though a simple PPC core should not be so complex in term of structure and of instructions: allowing for a faster clock-speed ) can clock at 1 GHz, but not at 2 GHz. why ?
Because it is a short-piped design.
 
Status
Not open for further replies.
Back
Top