ShenWei SW1600 -- Made in China 16-core CPU

fellix

Veteran
Red China is again placing itself in the Top 500 with another beast - the BlueLight MPP supercomputer, but this time with homemade high-performance CPUs in its heart.

The ShenWei SW1600 processor - a third generation of internally developed CPU architecture is quite a leap from the two prevoius models. Looking at the specs though, one would notice some close similarities with the good old Alpha 21164 from DEC. It is essentially 16 Alpha cores crammed in a monolithic die.

A blog post with specs here: http://laotsao.wordpress.com/2011/10/29/sw1600-and-alpha-21164/

Pictures and block diagrams of the CPU and the HPC impl: http://laotsao.wordpress.com/2011/10/29/sunway-bluelight-mpp-神威蓝光/
 
Is it really based on Alpha 21164? 21164 is more than 15 years ago, IIRC. It also does only 2 FLOP per cycle, so it will need some vector units to do 140.8GFLOPS @ 1.1GHz, even with 16 cores, if it's really based on 21164.

According to this, its Rmax is 795.9 TFLOPS, compared to its Rpeak of 1070.16 TFLOPS, its LINPACK efficiency is 74.37%, which is comparable to similar MPP supercomputers (but better than GPU accelerated supercomputers). K computer is an exception, with more than 90% LINPACK efficiency.

Its claimed power consumption is 1074KW. That puts its Rmax/W in the league of GPU accelerated supercomputers and K computer, and better than other MPP supercomputers.

I think we'd be able to find out more details when next Top500 list is published later this month.
 
something very interesting from the second link :

Desktop

ATX design
2-4 SATA, 3Gbps
1PATA UDMA
2ge
4USB2
1PCI-Ex16, 3PCI-Ex4 2 32-bit PCI
office

I'm sure this will get quite many linux geeks interested, here's finally a non-PC computer that can use bog standard power supplies, cases and drives. good old PATA is here too.
 
Is it really based on Alpha 21164? 21164 is more than 15 years ago, IIRC. It also does only 2 FLOP per cycle, so it will need some vector units to do 140.8GFLOPS @ 1.1GHz, even with 16 cores, if it's really based on 21164.

According to this, its Rmax is 795.9 TFLOPS, compared to its Rpeak of 1070.16 TFLOPS, its LINPACK efficiency is 74.37%, which is comparable to similar MPP supercomputers (but better than GPU accelerated supercomputers). K computer is an exception, with more than 90% LINPACK efficiency.

Its claimed power consumption is 1074KW. That puts its Rmax/W in the league of GPU accelerated supercomputers and K computer, and better than other MPP supercomputers.

I think we'd be able to find out more details when next Top500 list is published later this month.


Well...I believe it doesn't matter if it is 15 years ago.

Not by saying DEC bring us cluster which are firstly used in Bulldozer in business, Intel's SCC as well as LRB is based on P54C which is 17 years ago.

I'm waiting for the New Top500 too, mostly looking forward to the IBM supercomputer.
 
Well...I believe it doesn't matter if it is 15 years ago.

Not by saying DEC bring us cluster which are firstly used in Bulldozer in business, Intel's SCC as well as LRB is based on P54C which is 17 years ago.

I still have my doubts. The report about "SW based on Alpha 21164" is actually very old (in 2000, IIRC). Since SW1600 is the third generation, it's possible that the core is, even if it's still using Alpha ISA, not based on the old 21164 core. Of course, since it's only designed to run at 1.1GHz, using 21164 is not impossible either (a version of 21164A runs @ 600MHz on .28 process IIRC, so it should be able to run @1.1GHz with ease at current process).
 
I still have my doubts. The report about "SW based on Alpha 21164" is actually very old (in 2000, IIRC). Since SW1600 is the third generation, it's possible that the core is, even if it's still using Alpha ISA, not based on the old 21164 core. Of course, since it's only designed to run at 1.1GHz, using 21164 is not impossible either (a version of 21164A runs @ 600MHz on .28 process IIRC, so it should be able to run @1.1GHz with ease at current process).
Well..Maybe they just add vector unit? 8 DP Flops pre Cycle sounds like based on 256bit vector unit.
 
Well..Maybe they just add vector unit? 8 DP Flops pre Cycle sounds like based on 256bit vector unit.

Of course, and it's more likely to be 4D FMA or two 4D FMUL + FADD, as the 21164's FPU works in this configuration.
 
On a related news, Fujitsu announced a new supercomputer system PRIMEHPC FX10, with performance up to 23.2 PFLOPS with the largest configuration.

It uses a new CPU named SPARC64 IXfx, derived from the SPARC64 VIIIfx used in the K computer. Basically SPARC64 IXfx is fabbed with a smaller process (40nm vs 45mm), so it has 16 cores instead of 8 cores, with memory bandwidth up to 85GB/s per chip. TDP is 110W.

The new system uses the same Tofu interconnection as in the K computer. The next SC Conference (starting this Saturday) is going to be interesting :)

News at PC Watch (Japanese)
 
On a related news, Fujitsu announced a new supercomputer system PRIMEHPC FX10, with performance up to 23.2 PFLOPS with the largest configuration.

It uses a new CPU named SPARC64 IXfx, derived from the SPARC64 VIIIfx used in the K computer. Basically SPARC64 IXfx is fabbed with a smaller process (40nm vs 45mm), so it has 16 cores instead of 8 cores, with memory bandwidth up to 85GB/s per chip. TDP is 110W.

The new system uses the same Tofu interconnection as in the K computer. The next SC Conference (starting this Saturday) is going to be interesting :)

News at PC Watch (Japanese)

Oh that's great.
The Top1 of the latest two TOP500 all had great improvements, It look like it will keep going this time.
 
(Japanese)
Oh don't worry, hardware pr0n is universal. :D

That watercooling setup looks bizarre. Not only are the pipes routed very weirdly, there's 4 supposedly 110W chips per cooling loop. Unless that water is actually liquid freon, I don't see how those tiny tubes could keep particularly the last CPU at a decent temp...
 
It uses a new CPU named SPARC64 IXfx, derived from the SPARC64 VIIIfx used in the K computer. Basically SPARC64 IXfx is fabbed with a smaller process (40nm vs 45mm), so it has 16 cores instead of 8 cores, with memory bandwidth up to 85GB/s per chip. TDP is 110W.
Something doesn't seem to quite add up. 40nm vs. 45nm isn't exactly a large shrink. 8 core VIIIfx according to some Fujitsu slides was ~500mm² which would make a 16-core version even at 40nm monstrous sized.
Maybe they were somehow able to pack transistors way more densely than the shrink alone suggests?
 
Oh don't worry, hardware pr0n is universal. :D

That watercooling setup looks bizarre. Not only are the pipes routed very weirdly, there's 4 supposedly 110W chips per cooling loop. Unless that water is actually liquid freon, I don't see how those tiny tubes could keep particularly the last CPU at a decent temp...
It's 2 CPUs per loop. The other four water blocks are covering a bunch of interfacing logic/controllers. If the coolant was freon, that would require bulky insulation of the pipes to prevent condensation.
 
Something doesn't seem to quite add up. 40nm vs. 45nm isn't exactly a large shrink. 8 core VIIIfx according to some Fujitsu slides was ~500mm² which would make a 16-core version even at 40nm monstrous sized.
Maybe they were somehow able to pack transistors way more densely than the shrink alone suggests?

Yeah, apparently they do increase density quite a bit. It's also possible that the old chip was designed with power efficiency in mind (Fujitsu claims that SPARC64 VIIIfx's TDP is only 58W @ 30℃ running @ 2GHz). The new chip is actually even smaller (484mm^2 vs 513mm^2) while having twice as many cores and more than twice amount of L2 cache (12MB shared vs 5MB shared). The number of transistors is also more than doubled.

However, the new chip runs significantly hotter. It runs at 80℃ air cooled and 50℃ water cooled. TDP is also roughly doubled, although the frequency is actually lower (1.8GHz vs 2GHz). So I guess they decided to go for using better cooling system instead of just making a cooler chip to maintain power efficiency.

I found a white paper by Fujitsu about its hardware here (this time in English :) )
 
The latest Top500 list has been released.

The K-computer has been completed and now actually produces more than 10 PFLOPS running LINPACK (thus lives up to the name "Kei" computer, which means 10,000 trillion). It's also the most power consuming supercomputer, with more than 12MW, but due to its high performance, also among the best power efficient supercomputers.

The "Sunway Blue Light" ranks at 14th.

There are also many new Bulldozer and Sandy Bridge EP based supercomputers. The fastest Bulldozer based supercomputer ranks at 12th, and the fastest Sandy Bridge EP based one ranks at 27th.
 
I've always wondered, at what degree the mainland China is able to utilize all that computing power in real productive projects. Do they have the cultivated expertise, what kind of scientific fields are being worked on & etc.? It is clear that if you have the gobs of money and manpower, it's relatively easy to build a top performing cluster and run impressive benchmark scores, but where are the papers and peer reviews?
 
I've always wondered, at what degree the mainland China is able to utilize all that computing power in real productive projects. Do they have the cultivated expertise, what kind of scientific fields are being worked on & etc.? It is clear that if you have the gobs of money and manpower, it's relatively easy to build a top performing cluster and run impressive benchmark scores, but where are the papers and peer reviews?

I just know the Fermi cluster in CAS which Rank 21 on TOP500 works, just as it said, for Process Engineering concern mostly at the relationship within molecules. What's more I heard the elder one which may Rank 60+ now, Dawning 5000A, is now controled by some oil company. But, I still don't know what project Nebulae and Tianhe-1A is running now either.
 
Back
Top