ISSCC 2005

Thank you SiBoy! You're posts are really informative :)
But...I hope SPUs have a division, or at least a reciprocal instruction?!
Any other complex FP op? :)
That's the last question..I swear :)
 
version said:
hmm, iam dissapointman , dont run paralell integer and floating
There may be re-use of hardware between int and float execution units, so running them simultaneously would probably have meant more transistors needed, plus more bandwidth from register file, local storage memories etc...
 
nAo said:
Thank you SiBoy! You're posts are really informative :)
But...I hope SPUs have a division, or at least a reciprocal instruction?!
Any other complex FP op? :)
That's the last question..I swear :)

Divide - no

Reciprocal - yes (there is a table look-up for reciprocal approx. in the odd pipe)
 
Guden Oden said:
version said:
hmm, iam dissapointman , dont run paralell integer and floating
There may be re-use of hardware between int and float execution units, so running them simultaneously would probably have meant more transistors needed, plus more bandwidth from register file, local storage memories etc...

this is not problem, latency is a BIG, nightmare
 
SiBoy said:
nAo said:
Thank you SiBoy! You're posts are really informative :)
But...I hope SPUs have a division, or at least a reciprocal instruction?!
Any other complex FP op? :)
That's the last question..I swear :)

Divide - no

Reciprocal - yes (there is a table look-up for reciprocal approx. in the odd pipe)


reciprocal latency ? please
 
Megadrive1988 said:
Cell Q&A (totally 'meh' though)

http://business.timesonline.co.uk/article/0,,9075-1475581,00.html

Financial analysts have suggested that the high cost of the PlayStation 3, which they predict will have to be priced at between $500 and $750 when it is launched in the United States, will deter potential buyers. As always with computers, though, the price is likely to fall quite rapidly with time.
Are those prices for real? :oops:

I'm sure we'll all agree, a console simply won't hit mass market acceptance until it drops to $300 or less.
 
version said:
6,7 ??? what a fuck this is much, ps2 VU 4 latency
Dude - this is 4Ghz chip, I was half expecting latencies to be in order of 10-12...
And VU had exceptionally low latencies - most other SIMD designs are considerably worse (DC SH4 was in 7s too, PSP VFPU is also higher, Gekko's pairedSingles aren't 4/1 most of the time either).

SiBoy,
How often can reciprocal lookups be issued? I don't suppose I could do it every cycle like FMADDs :p Also yes, do you know the latency for it?
 
SiBoy said:
nAo said:
Thank you SiBoy! You're posts are really informative :)
But...I hope SPUs have a division, or at least a reciprocal instruction?!
Any other complex FP op? :)
That's the last question..I swear :)

Divide - no

Reciprocal - yes (there is a table look-up for reciprocal approx. in the odd pipe)

I wonder if the SPUs have inherited the VMX Mul, sub, make cup of tea OP thats great for newton rhapson...
Would be kind of neat if they had, who needs divide with appox recip and newton rhapson.
 
Hi guys. slightly off topic. while surfing the web this morning, I came across something I had not read about before. Intel's Tanglewood CPU. it apparently will have upto 16 cores. I notice there has been talk about Tanglewood since at least 2003. any idea how a 16-core Tanglewood CPU might compare compare to an 8 SPU/SPE Cell ?
 
Fafalada said:
version said:
6,7 ??? what a fuck this is much, ps2 VU 4 latency
Dude - this is 4Ghz chip, I was half expecting latencies to be in order of 10-12...
And VU had exceptionally low latencies - most other SIMD designs are considerably worse (DC SH4 was in 7s too, PSP VFPU is also higher, Gekko's pairedSingles aren't 4/1 most of the time either).

SiBoy,
How often can reciprocal lookups be issued? I don't suppose I could do it every cycle like FMADDs :p Also yes, do you know the latency for it?

SPE code :

VLOAD r,r,r
nop
nop
nop
nop
nop
VMADD r,r,r
nop
nop
nop
nop
nop


10-30% performance
 
version said:
Fafalada said:
version said:
6,7 ??? what a fuck this is much, ps2 VU 4 latency
Dude - this is 4Ghz chip, I was half expecting latencies to be in order of 10-12...
And VU had exceptionally low latencies - most other SIMD designs are considerably worse (DC SH4 was in 7s too, PSP VFPU is also higher, Gekko's pairedSingles aren't 4/1 most of the time either).

SiBoy,
How often can reciprocal lookups be issued? I don't suppose I could do it every cycle like FMADDs :p Also yes, do you know the latency for it?

SPE code :

VLOAD r,r,r
nop
nop
nop
nop
nop
VMADD r,r,r
nop
nop
nop
nop
nop


10-30% performance

Ahem... this is not very optimized code, is it ?
 
back on-topic. here's a pretty sweet little article

http://www.poughkeepsiejournal.com/today/frontpage/stories/fr020805s1.shtml


Tiny new chip packs a lot of power
IBM East Fishkill plant to make microprocessor
By Craig Wolf
Poughkeepsie Journal

What appears to be the hottest microprocessor chip in the world looked even hotter Monday as IBM Corp., Sony Group and Toshiba Corp. revealed its performance is 10 times that of current chips.

The tech trio's ''Cell'' chip will be made at IBM's 300-millimeter plant in East Fishkill this summer to satisfy needs of Sony and Toshiba for a new generation of broadband, video-hungry home entertainment systems. The plant already makes a version of the chip being used in computer workstations suited to game development. The plant is being expanded by IBM and six partners, including Sony and Toshiba.

New Cell details came out in San Francisco at the International Solid State Circuits Conference, a major annual technical gathering. Monday's revelations were whoppers.

8 processors per chip

The chip has eight cores, or separate processors, that operate synergistically. Industry chatter was predicting four cores. It has run at speeds of better than 4 gigahertz, or billions of cycles per second, somewhat ahead of Intel Corp.'s best speeds. It can run several kinds of software simultaneously, including Linux and proprietary gaming programs.

More details are to come out today, but analysts were impressed already.

''This is still the biggest chip technology advance in probably 20 years,'' said Richard Doherty, research director at Envisioneering Group in Seaford, Nassau County.

If anything, claims of a 10-fold leap in performance are understated, Doherty said. ''Our estimate is 10 to 20, so they're being conservative,'' he said.

He added Cell developers said they could have put 16 cores on the same size chip if they had thought it necessary.


Touted as a ''supercomputer on a chip" by the trio, Cell may well find use in business environments including supercomputing, but its first and main function will be to lift the computerized entertainment world to new levels.

Cell is aimed at your house. The chip is expected to be used in Sony's next-generation Playstation as well as in high-definition television sets.

''It's very flexible,'' said Jim Kahle, an IBM fellow, quoted by the Associated Press. ''We support many operating systems with our virtualization technology so we can run multiple operating systems at the same time, doing different jobs on the system.''

What that could mean, for example, is that gamers in different locations could play online using Linux, an open-source software widely available around the world. The chip could handle that work as well as running Sony's software that controls the game logic and characters.

''Having hierarchical operating systems, if you will, that's a whole big step in computing,'' Doherty said. ''In entertainment, that's tremendous.''

Cell was developed by the trio at IBM's Austin, Texas, facility beginning in 2001. It contains 234 million transistors in a space of 221 square millimeters, about the size of a fingernail. It's made with 90-nanometer process, so called because it creates features that small, a nanometer being a billionth of a meter.

Executives toasted their teamwork in a statement issued Monday. Masashi Muromachi, a corporate vice president of Toshiba, said, ''We are proud that Cell, a revolutionary microprocessor with a brand new architecture that leapfrogs the performance of existing processors, has been created through a perfect synergy of IBM, Sony Group and Toshiba's capabilities and talented resources.''

Whether the team's work eclipses that of leader Intel remains to be seen. Monday was also the day on which Intel said it was now making a two-processor chip.

''It's poor timing,'' Doherty said. ''The twin-piston engine comes out the same day as a V-8.''
 
"Ahem... this is not very optimized code, is it ?"

yes not optimised, but with 6-7 latency impossible to make optimized code, or you are GOD :)
 
SPE code :

VLOAD r,r,r
nop
nop
nop
nop
nop
VMADD r,r,r
nop
nop
nop
nop
nop

Glad your not working here, might as well write your vu code in basic :)

yes not optimised, but with 6-7 latency impossible to make optimized code, or you are GOD

Think you've gotten the wrong end of the stick. 6-7 cycles to finish the instruction but new instruction can be started every cycle. Remember they are pipelines.

So assuming you keep the pipeline busy eg vertex transform, then by the time you have finished transforming 1000000+ vertices you only have to wait 7 cycles for the last one to be completed.

Now keeping them busy and full of data, theres the hard part.
 
He means it is difficult to mentally juggle that many outstanding instructions if you are assembly programming ... true enough, but even a macro assembler can unroll loops for you.
 
Back
Top