Check it out: Merrimac a 128 GFlop/s stream processor

Status
Not open for further replies.
a $20K 2 TFLOPS workstation

They should have just wait another year and get a PS3 instead for $300 :LOL:

Why are you guys posting this thing on the console forum ? On its own thread too, not on the official one that was just made ?
 
V3 said:
a $20K 2 TFLOPS workstation

They should have just wait another year and get a PS3 instead for $300 :LOL:

To a certain extent this is very probable. Heck, I'm sure the same could have been said of the EE in 1998 when looking at the contemporary workstations.

If anything, this article is nothing but promising news for those who feel the Broadband Engine is feasible. When you look at what they did (200 64bit FPUs, the number of 64bit registers, the clock and thermal output) at 90nm for a 200mm^2 IC, it's golden.

And when you look at the STI advantage in geometry & process technology such as PD-sSOI (AMD a useful metric) and their dielectrics, et al - it's looking really, really good. And at a size that's reasonable and in-line with my thinking - as opposed to the ~400mm^2 chatter.
 
nondescript said:
Panajev2001a said:
Nondescript, of course that professor is only a crazy EE guy with no formal training in CS and that went to university in the 1910s...

Not only that, he was Ken Kutaragi's roommate.

How can he go to university in 1910s and be KK's roommate...? Ringu.. :devilish:
 
...

Pana

Well... your estimate was 10-15 Watts for 32 of them at 1 GHz...
I expected 1 PE per chip each delivering 32~64 GFLOPS peak.

Are you saying they will break the 60 Watts barrier to go at 2 GHz ?
This Merrimac thing probably can't even reach 2 Ghz.

I can start now and count you features of MIPS and ARM cores and end up with 1 mm^2 of space
That's just for the integer unit. Add in the FPU, MMU, DMAC, and memory controllers and the size swells to 20 mm2 even at sub 90 nm process.

10 MTransistors for 128 KB of SRAM and 2 KB worth of Registers ? Are you for real ?
Yap. 10 million transistors of 128 KB SRAM, register files not included.

I cannot count more than 780-1 MTransistors for the 128 KB of LS ( no needs of cache tags as it is not a cache ).
Actually this is a cache with writeback disabled. Each line also carries the synchronization information, so the overhead is as much as, if not greater than, regular cache.

Also ever thought that BlueGene/L and supercomputers in general work at different problems than a processor designed for multi-media and 3D graphics processing ?
CELL is not a rasterizer. It handles linear vectors just like supercomputers do.

you are not looking that much smarter by not admitting to be ever wrong...
Because I am not wrong. I don't even reply back if I felt I was wrong.

smart people learn from their often mistakes.
Yap. You should listen to your own advice and admit the teraflop CELL chip is indeed looking more like a science fiction each passing day.

They should have just wait another year and get a PS3 instead for $300
But you need to buy 8 of them at $3200 to reach the paper teraflop. Again, IBM would rather spend $100K to reach 1 teraflop than to buy 8 PSX3s and link them together. What does IBM know that you don't????
 
Vince

If anything, this article is nothing but promising news for those who feel the Broadband Engine is feasible. When you look at what they did (200 64bit FPUs, the number of 64bit registers, the clock and thermal output) at 90nm for a 200mm^2 IC, it's golden.
I would like to point out that you are missing 32 full-blown CPU control units, Load/Store units, cache controllers, and three additional PowerPCs from your estimate.... Throw them in and you will be getting a die size like 450~500 mm2 and burn God knows how many watts...
 
Status
Not open for further replies.
Back
Top