CELL from GDC

Titanio said:
one said:
IBM has digests of 4 Cell technical papers for the ISSCC 2005. Search ''cell processor" from here.

Thanks. There's also an article there I hadn't seen before (if you search for Cell, it's called "Cell moves into the limelight", it was from mpronline.com).

The article seems to have been written with insight from IBM. A quote from it:

"In designing the BPA, IBM looked at different workloads in areas
of cryptography, graphics transform and lighting, physics,
fast-Fourier transforms, matrix math, and other more scientific
workloads."

So..vertex processing, right? Yeah yeah, I know, nothing's certain...just vertex processing on Cell with pixel processing on the GPU is a favoured PS3 configuration for me ;)

Do we know enough now to work out the theoretical peak vertex performance per sec of one SPE @ 4Ghz? What transformation is used when calculating such peaks?

Looking at the SPEs' local memory, they could each hold ~16,000 vertices at a time...right (?) You can fetch a vertex every 8 cycles? Sorry, this isn't my forte, but I'd like to learn ;)

edit - doh, mixing up bits and bytes :rolleyes:


cell can doing 4 gigapoly/sec
but max polygon count is unimportant, because this is 2 poly with nvidia's tech:
displ.JPG
 
version said:
cell can doing 4 gigapoly/sec
but max polygon count is unimportant, because this is 2 poly with nvidia's tech:
displ.JPG

Cheers. I've seen that number before, but I've always wondered how it was derived. Can anyone break it down for me? I know it might be asking a lot..I'd really really appreciated it though! :D

Also, cool pic. Displacement mapping? Is that a 100% pixel shading op?
 
Here's another quote from that article. I don't know if it's significant, but..

The processing capability of the Cell processor is still being
explored. One demo that IBM has showed was a detailed 3D
contour map with satellite images imposed on the geography.
The Cell processor can render the ray-cast graphics at around
an order of magnitude faster than a contemporary PC processor.

"Ray-cast" graphics?

Of course, it doesn't say it was a real time demo..
 
version said:
Vysez said:
version said:
cell can doing 4 gigapoly/sec
but max polygon count is unimportant, because this is 2 poly with nvidia's tech:
http://web.axelero.hu/varga1973/displ.JPG
What's that? 3D textures? If it's 3D textures, nobody will use them on a console anyway because of the memory hog nature of the 3Dtex.

no, this is a per pixel displacement mapping

Distance mapping a.k.a. per pixel displacement mapping with distance functions but it still uses 3D textures but it's not as bad a hog,

http://www.beyond3d.com/forum/viewtopic.php?t=20571
 
Cheers. I've seen that number before, but I've always wondered how it was derived.
I believe it was : 8 cycle per transform, 8SPUs @ 4ghz = 4GTransforms/s.

It's an off hand estimate because noone knows exactly how fast SPE can do reciprocal/divides (which is the assumed speed limit for the transform).
 
if use bezier patch with 16 bit integer(8*16 bit simd in SPE), cell can do 500 mills patch /sec

gpu tesselate(16*16) a patch to 512 polys then max polygon count will be 250 gigapoly/sec :)
 
Fafalada said:
Cheers. I've seen that number before, but I've always wondered how it was derived.
I believe it was : 8 cycle per transform, 8SPUs @ 4ghz = 4GTransforms/s.

It's an off hand estimate because noone knows exactly how fast SPE can do reciprocal/divides (which is the assumed speed limit for the transform).

Thanks for that! :) Do you know what transformation is (assumed to be) 8 cycles? The technically simplest transformation I can think of is a non-homogenous coordinate translate aka 3 additions ;) Is there a standard transformation used for calculating theoretical peaks?

Also, has anyone ventured to try and factor in memory access? Loading to LS, loading from LS to registers? Are we factoring in pipelining?

Heh, it could be good practise until your PS3 kit arrives ;)
 
Thanks for that! Smile Do you know what transformation is (assumed to be) 8 cycles? The technically simplest transformation I can think of is a non-homogenous coordinate translate aka 3 additions
The standard minimum transform used for this is 4x4Matrix * 4x1 Vector + perspective transform (divide + mul).
Ie. in PS2 VU, where your Divide is on the second pipeline, minimum transform is 5cycles. (4 for matrix transform and 1 to scale with perspective factor).

And yes, all these calculations factor in pipelining is used to hide latencies (memory or otherwise). Streaming processes are nice like that.

Heh, it could be good practise until your PS3 kit arrives
Nah I'll just wait, it shouldn't be too long now anyhow. Besides, without detailed CPU docs, this kind of excercise isn't worth much.
 
Fafalada said:
The standard minimum transform used for this is 4x4Matrix * 4x1 Vector + perspective transform (divide + mul).
Ie. in PS2 VU, where your Divide is on the second pipeline, minimum transform is 5cycles. (4 for matrix transform and 1 to scale with perspective factor).

Cheers..I guess that's a little fairer and a little more realistic than my idea ;)

Nah I'll just wait, it shouldn't be too long now anyhow. Besides, without detailed CPU docs, this kind of excercise isn't worth much.

Fair enough, it did cross my mind that perhaps if this was engaged in, by the time we figured it all out, Sony would have released nicely formatted figures and specs anyway ;)

Here's hoping we get something by the end of the month. Is there much significance attached to the end of the month being the end of Sony's fiscal year - much reason for them to really try and get a PS3 announcement out by then?
 
Titanio said:
one said:
IBM has digests of 4 Cell technical papers for the ISSCC 2005. Search ''cell processor" from here.

Thanks. There's also an article there I hadn't seen before (if you search for Cell, it's called "Cell moves into the limelight", it was from mpronline.com).
Wow, thanks :D Seems like the full report is posted there. Save it locally before it's pulled (in the other place in the IBM site they had PowerPC 970MP document but it was pulled after a while)

Another quote...
The tool chain for Cell is built on PowerPC Linux. The
programming of the SPE is based on C,with limited C++ support.
Software research is under way for Fortran and other
languages. Debugging tools include extensions for P-Trace
and extended Gnu debugger (GDB). The ultimate goal of the
software research is to build an abstraction layer on top of the
hardware that can scale with additional Cell processors or Cell
processors with differing amounts of resources. Programming
the Cell processor will be unlike programming any other
processor in mainstream use. It will require new tools, and
possibly a new programming paradigm, because programs for
the SPE should be self-contained with data and instruction
bundles (or Cells). This is not the same programming model
used for languages with strict class structures like Java.

Titanio said:
Here's hoping we get something by the end of the month. Is there much significance attached to the end of the month being the end of Sony's fiscal year - much reason for them to really try and get a PS3 announcement out by then?
Maybe trying to baloon their stock price to compensate the lackluster 2004 performance of the Sony Group...
Microsoft disclosed so many info in Feb - March for their upcoming software / hardware, it looks they did it to mitigate the projected March unveiling of the PS3. Now it's hilarious if Sony abandons the March unveiling idea... :devilish:
 
The MPR's CELL article confrimed that SPE's can use their DMA engine to write to the PPE's L2 cache :). Does this also help the idea of DMA reads possibly cache-able in some way in the CPU's L2 cache ?
 
Panajev2001a said:
The MPR's CELL article confrimed that SPE's can use their DMA engine to write to the PPE's L2 cache :). Does this also help the idea of DMA reads possibly cache-able in some way in the CPU's L2 cache ?

kaigai072.jpg


kaigai075.jpg


8)
 
Er Jaws, that's referring to software DCache inside SPE local memory. Effectivelly that would be L1 cache.

What Pana talks about would still be hardware function I believe (and L2, thus higher latency).
 
Fafalada said:
Er Jaws, that's referring to software DCache inside SPE local memory. Effectivelly that would be L1 cache.

What Pana talks about would still be hardware function I believe (and L2, thus higher latency).

Yes, he's talking about PPE's L2 cache but those slides suggest full software control for cache management that could be extended to PPE's L2 cache?
 
Yes, he's talking about PPE's L2 cache but those slides suggest full software control for cache management that could be extended to PPE's L2 cache?
Pretty much every decent cpu has full software control over cache management if you really need it. A working hardware scheme is generally just more efficient.

The point of the slides is that you can implement a software L1 scheme (and with a rather large L1 to boot), which could make SPE more suitable for general purpose tasks.
The DMA being cacheable through L2 would be an added benefit on top of that software scheme, not part of it.
 
Fafalada said:
Yes, he's talking about PPE's L2 cache but those slides suggest full software control for cache management that could be extended to PPE's L2 cache?
Pretty much every decent cpu has full software control over cache management if you really need it. A working hardware scheme is generally just more efficient.

The point of the slides is that you can implement a software L1 scheme (and with a rather large L1 to boot), which could make SPE more suitable for general purpose tasks.
The DMA being cacheable through L2 would be an added benefit on top of that software scheme, not part of it.

I think I know now what Pana's referring to then, is it caching all SPE's DMA lists in PPE's L2 cache? If so, I thought that was already possible IIRC, from one of the IBM patents...
 
Sony Computer Entertainment Inc. is holding a COLLADA sponsored session at
Game Developrt Conference 2005 in San Francisco.

COLLADA: An Open Interchange Format for the Interactive 3D Industry
Speakers:
... Remi Arnaud (Graphics Architect, Sony Computer Entertainment America)
... Mark Barnes (COLLADA Project Lead, Sony Computer Entertainment America)
Time/Date: Thursday (March 10, 2005) 12:00pm -1:00pm
Track: Programming
Format: Sponsored Session
Experience Level: All - Open to All Levels

COLLADA allows artists working in XSI, Max, Maya to share data easily because it
enables the import and export of files. COLLADA also makes writing tools for
games easier because instead of having to write exporters for all the tools that
are used, one can simply write a COLLADA tool. This reduces development and
maintenance costs and reduces dependency on a particular brand of tool.

In the COLLADA sponsored session we will cover 3 areas of the project:

* Quality of tools for import and export. We are working on conformance test
suites so measure the robustness of plug-ins from our DCC partners like
Softimage. We'll demonstrate COLLADA content running on the PSP exported from
Maya.

* Ease of use with the upcoming COLLADA API. The new API is written in C++ and
provides an asset centric interface to a COLLADA database. The API is designed
to be a COLLADA reference implementation that is easy to integrate with
applications and run on multiple platforms.

* Future glimpse of COLLADA 2.0 features. We are designing new features for
COLLADA with various partners that include:

- Physics with Novodex/Ageia supporting rigid body dynamics.
- Shader Effects with NVIDIA, Softimage, 3Dlabs, and ATI to enhance COLLADA's existing
capabilities with programmable materials and shaders.

Please join SCEI at GDC 2005 to learn more about COLLADA!

http://collada.org/public_forum/viewtopic.php?t=88

Slides from presentations,

Overview

COLLADA API

PSP Tool Chain

COLLADA Physics

FX Composer 2.0
 
Back
Top