NEC single-chip parallel processor

More info: http://www.nec.co.jp/press/en/0302/1001.html'

It's clearly not targetted at consumer products but interesting nonetheless.
 
cybamerc said:
More info: http://www.nec.co.jp/press/en/0302/1001.html'

It's clearly not targetted at consumer products but interesting nonetheless.

Yes, but the technology lays the foundation which can be used for consumer products. The power consumption of that chip is amazing considering it can do the same work as four 3GHz P4s 8)

This achievement is equivalent to putting the total processing power of four of the latest model personal computers inside a personal digital assistant (PDA).

Keep in mind that this is only using 0.18u and only 32 million transistors ;) Since it's parallel scaling to multiple cores or PEs should be straight forward.

I wonder if NEC is going to use this technology in their PaPeRo autonomous robot in the near future in some form or another. :eek:
 
PC-Engine said:
Yes, but the technology lays the foundation which can be used for consumer products. The power consumption of that chip is amazing considering it can do the same work as four 3GHz P4s 8)
Really? Any benchmarks to back up the theoretical performance?
Keep in mind that this is only using 0.18u and only 32 million transistors ;) Since it's parallel scaling to multiple cores or PEs should be straight forward.
How would one extract a level parallelism that would even remotely utilize that many cores?
 
KnightBreed said:
PC-Engine said:
Yes, but the technology lays the foundation which can be used for consumer products. The power consumption of that chip is amazing considering it can do the same work as four 3GHz P4s 8)
Really? Any benchmarks to back up the theoretical performance?
Keep in mind that this is only using 0.18u and only 32 million transistors ;) Since it's parallel scaling to multiple cores or PEs should be straight forward.
How would one extract a level parallelism that would even remotely utilize that many cores?

That's what NEC presented at the IEEE International Solid-State Circuits Conference. I have no reason to doubt their claims considering their CPUs power the Earth Simulator;)

Each PE can execute 4-way very long instruction word (VLIW) instructions (a format in which four instructions are optimized via compiler scheduling and packed as a single long instruction). The PEs operate based on the SIMD theory (Single Instruction Multiple Data: Multiple data processed by one instruction), with 128 PEs linked together in a ring formation

The proposed processor chip realizes efficient parallel processing which exploits maximum PE array utilization by embedding the following hardware features; 1) Assignment of a memory block to each PE which enables independent indexed memory access by each PE, 2) Shift register mechanism to automatically transfer and assign a row of image pixels in input video data to each PE, 3) PE connection in ring structure and register-to-register direct connection to enable fast data transfer between PEs, 4) Combined pipe-line structure among the on-chip control processor and the PEs to realize efficient data exchange, and 5) Special instructions to accelerate operation switch on conditions of each PEs' data.

An extended C language to which parallel data structures, dedicated operators and dedicated control syntax have been added in order to enable detailed specification of the parallel operations of the PEs linked in ring formation.

Technology: 0.18 mm CMOS, 7-Metal
Supply Voltage: 1.8V (Internal), 3.3V (I/O)
Clock: 100MHz(at 1.8V)
Number of Trs.: 32.7M
Area: 11mm x 11mm
Number of Pins: 332 (Signal), 500 (Total)
Power: 2.5W - 4W (at 1.8V)
Performance: 51.2GOPS (at 1.8V)
I-Cache: 32kB (2 Way Set Associative, 256B Line)
D-Cache: 2kB (2 Way Set Associative, 64B Line)
Image RAM: 256kB (2kB x 128)
Instruction Issue: Max. 4 issue x 128 Processing Elements
Data: 16 bit x 1, 8bit x 128
Bus Interface: 64b SDRAM, 32b PCI/CPU, I2C

NEC isn't the kind of company that hypes things ;)
 
If NEC does use this technology in their next console endeavor.. it would likely be for Nintendo.

Hmm.. I wonder how this processing architecture matches up against CELL..
 
From a German News service:

ISSCC: Autos lernen sehen mit NECs Videochip
In dem japanischen Parallelprozessor IMAP-CE rechnen 128 Videoprozessoren parallel. NEC denkt daran, dass er als intelligenter Autopilot in Autos und anderen Transportsystemen beispielsweise helfen soll, Unfälle zu vermeiden. Mit einer Spitzenleistung von 51,2 GOPS (Milliarden Operationen pro Sekunde) kann der auf der International Solid State Circuit Conference (ISSCC) vorgestellte Chip während der Fahrt Videobilder von Front- und Seitenkameras eines Autos verarbeiten und dadurch zu geringe Abstände oder Gefahrsituationen erkennen. Die hohe Parallelität der Prozessormatrix erlaubt unterschiedliche Erkennungsalgorithmen, um wechselnde Fahrsituationen, Strassenbegrenzungen und Hindernisse zu berücksichtigen.

NECs Ziel war es, einen hocheffizienten Prozessor für die Bewegtbildverarbeitung zu konstruieren, der flexibler als existierende ASIC-Lösungen ist, aber dabei weniger Energie verbraucht. Mit einem Takt von 100 MHz rechnet IMAP-CE rund vier mal schneller als ein 2,8-GHz-PC und benötigt maximal vier Watt. Damit sei er rund 100 mal energieeffizienter als etwa ein Pentium 4.

Für die 128 8-Bit-Prozessoren mit je 2 KByte Speicher benötigt NEC 32 Millionen Transistoren, die auf 11 mm * 11 mm passen. Muster werden derzeit in 0,18-µm-Technik hergestellt. "Wir haben aber noch effizientere Prozesse", gibt der Entwicklungsleiter Shorin Kyo einen Ausblick. Derzeit steht NEC mit japanischen Autoherstellern in Verhandlung. Namen will der Konzern noch keine nennen. Die Erkennungsalgorithmen sind in der Standardsprache C programmiert. Den Compiler mit speziellen Erweiterungen zur Parallelisierung liefert NEC mit. (Erich Bonnert)

You can transplate it with your translater of choice. :D

Fredi
 
sounds similar to Cell, although I have seen no mention of e-DRAM which I think would be fundamental to achieve optimal real-world performance...

some of the ideas NEC has seem taken out of that Cell patent ( assigning to a memory block one of the 128 PEs, etc... )

2 ALUs * 6 GHz ( Double pumped ALUs ) *2 (32 bits ops/cycle) = 24 GOPS

128/4 PEs ( each PE is 8 bits ) * 100 MHz = 3.2 GOPS

uhm...

That SI link must be way off then...

also looking at this ( and at the 64 bits SDRAM used )

I-Cache: 32kB (2 Way Set Associative, 256B Line)
D-Cache: 2kB (2 Way Set Associative, 64B Line)
Image RAM: 256kB (2kB x 128)

I wonder in real-world applications if this might have even less efficiency than Cell...
 
Back
Top