ISSCC 2005

ok if fully pipelined and work on 6 vertex in same time but code is big

vload r1,r,r
vload r2,r,r
vload r3,r,r
vload r4,r,r
vload r5,r,r
vload r6,r,r
vmul r,r,r1
vmadd r,r,r2
vmadd r,r,r3
vmadd r,r,r4
vmadd r,r,r5
vmadd r,r,r6
vmul r,r,r1
vmadd r,r,r2
vmadd r,r,r3
vmadd r,r,r4
vmadd r,r,r5
vmadd r,r,r6
vmul r,r,r1
vmadd r,r,r2
vmadd r,r,r3
vmadd r,r,r4
vmadd r,r,r5
vmadd r,r,r6
vmul r,r,r1
vmadd r,r,r2
vmadd r,r,r3
vmadd r,r,r4
vmadd r,r,r5
vmadd r,r,r6


this is matrix-vertex mulytiply on 6 vertexs, it is a HUMOR :D, but no stall
 
ok if fully pipelined and work on 6 vertex in same time but code is big

Yes size / speed is the tradeoff in such a simple case, however for more complicated cases, the size issue minimizes as requirements for loop unrolling to hide data fetches is no longer required.


Anyway this is getting OT as these are general issues affecting programming on pretty much every processor available in past 10 or so years, and are not unique to cell.
 
He is not loop unrolling for data fetches, or at least not anymore than for the latency of every other instruction he used in there.
 
version said:
this is not problem
How would YOU know how much extra hardware might be needed to run both int and float ops simultaneously? :p The SPU is already 21 million trannies, that's what, 2/3 the size of the original AMD Athlon? Besides, local storage only delivers 128 bits of data per cycle anyway, if you don't have all your data in registers already chances are very good you're not going to see any speedup by simultaneous execution anyway. That's likely part of the reason why STI didn't make the bugger do simultaneous execution in the first place.

latency is a BIG, nightmare
You complain way too much. If you're going to keep whining like that, better stay away from the gaming business altogether... Go work for the Microsoft Office team instead. There, nobody is going to ask you to write high-performing tight code! :D
 
Megadrive1988 said:
pc999 said:
Is that for consumers ,i.e., not server or like?

Intel's Tanglewood is no doubt for servers, workstations and whatnot, not consumers

In that case is better you ask about performace by die size, they usually make really mosters in those market
 
Dunno if it is true, but a guy called KelleyCook on Ars Forums wrote that:
For what its worth the SPE ISA itself is not new and is already included within GCC 3.4.

It comes with many of Freescale's—nee Motorola—PPC cores. They call their base core, e500.
EDIT: I googled around and it seems e500 doesn't support 4-sp vectors :?

ciao,
Marco
 
version said:
yes 2 cycle imprecise result, and more cycle with iterations(Newton-Raphson method)
Actually I was mistaken, only 1 lookup and 1 iteration needed for single precision result with 3DNow! ... the iteration is just broken up in 2 steps because of the lack of FMA.

Anyway as I said, 30 cycles is an overestimation.
 
The e500 doesn't support *any* vectors at all without SPE... However I think this person is confusing Motorola's SPE (their 2nd SIMD arch) for the SPE units in Cell...
 
Megadrive1988 said:
''This is still the biggest chip technology advance in probably 20 years,'' said Richard Doherty, research director at Envisioneering Group in Seaford, Nassau County.

If anything, claims of a 10-fold leap in performance are understated, Doherty said. ''Our estimate is 10 to 20, so they're being conservative,'' he said.

He added Cell developers said they could have put 16 cores on the same size chip if they had thought it necessary.

Whether the team's work eclipses that of leader Intel remains to be seen. Monday was also the day on which Intel said it was now making a two-processor chip.

''It's poor timing,'' Doherty said. ''The twin-piston engine comes out the same day as a V-8.''

Ahh, ANALysts. So called because they have their head up.....

10x? bah, I can get coprocessors today that do 100 Gflops+ with all the same restrictions and programming headaches.

Aaron Spink
speaking for myself inc.
 
Something that i only asked about and never got a solid explaination, since when was is known that Cell does not have any eDRAM ?

what happened to 16-64 MB of eDRAM that was supposed to be one of the major advantages of Cell ?
 
Megadrive1988 said:
Something that i only asked about and never got a solid explaination, since when was is known that Cell does not have any eDRAM ?

what happened to 16-64 MB of eDRAM that was supposed to be one of the major advantages of Cell ?

My guess would be they figured it was too difficult to manufacture the eDRAM at 65nm so they increased the LS and cache instead. Also the eDRAM was probably too slow? If you recall the eDRAM in EE+GS@90nm wasn't using 90nm. It was using 130nm IIRC.
 
PC-Engine said:
Megadrive1988 said:
Something that i only asked about and never got a solid explaination, since when was is known that Cell does not have any eDRAM ?

what happened to 16-64 MB of eDRAM that was supposed to be one of the major advantages of Cell ?

My guess would be they figured it was too difficult to manufacture the eDRAM at 65nm so they increased the LS and cache instead. Also the eDRAM was probably too slow? If you recall the eDRAM in EE+GS@90nm wasn't using 90nm. It was using 130nm IIRC.


ok I suppose that all makes sense. well, Sony had better compensate for the lack of eDRAM with lots more external main memory.
<that's the ram hugger in me talking> :LOL:
 
My guess would be they figured it was too difficult to manufacture the eDRAM at 65nm so they increased the LS and cache instead. Also the eDRAM was probably too slow?

eDRAM would be *much* easier than logic to deal with...

Plus I dunno about the slow part... While eDRAM has a longer latency than SRAMs do, the much higher density offered by eDRAM mean less wire-delay than you get with SRAMs which can almost offset the latency penalty suffered by eDRAMs...
 
Something that i only asked about and never got a solid explaination, since when was is known that Cell does not have any eDRAM ?

what happened to 16-64 MB of eDRAM that was supposed to be one of the major advantages of Cell ?

Like I said before, the patent doesn't mentioned eDRAM. It was because of the 1024 bit bus that people assumed it was eDRAM.

Though with just a single Cell, do you think 25GB/s of memory bandwidth is sufficient for PS3 to feed Cell and NV GPU without eDRAM somewhere in the system ?
 
Back
Top