MiniFAQ : How CELL works.

V3 · Aug 22, 2003

question is, is it so hard to design libraries to automatically synchronise the cores? or at least not leave it to the devs to MANUALLY handle synchronisation...

AFAIK synchronisation is done in hardware.

Megadrive1988 · Aug 23, 2003

Isn't the PS2 like ~70 times more powerful? The PSone could do 300.000 textured polygons/s and de the PS2 can do let's say 15 - 20 million polygons or more per second? But I haven't taken into account the special effects like bi-lineair filtering, specular lighting,.... so it should be even more powerful!

The PSone/PS1/PSX cannot display 300,000 textured polygons/sec.
that's how much Sega's Model 2 board can do, and PS1 is not as powerful.

PS1 does:

1.5 million lines/sec (GTE)
500,000 transformed (GTE)
360,000 flat shaded polygons/sec displayed (GPU)
180,000 textured, gouraud shaded, lit polygons/sec displayed (GPU)

The GTE (geometry transform engine) can transform 500,000 polygons/sec but the GPU can only draw/display 360,000 flat shaded polys/sec, and 180,000 textured, gouraud shaded and lit polys/sec. so 180k textured polygons/sec in the real figure for PS1, not 300k textured.

remember, a machine is only as strong as its weakest link.

DeadmeatGA · Aug 23, 2003

...

To Gubbi

That's just a matter of time. When they've spent 6 months chasing down race conditions on a 32 APU system they'll embrace CSP-style programming with zeal.

Or sign up with Microsoft. There are limits to what underpaid and overworked developers can take.

To london-boy

the situation is, Cell will have multiple cores which can be used, much like PS2's VUs, to do whatever the developer wants.

It is not what developers want, it is what is handed over to them.

is it so hard to design libraries to automatically synchronise the cores? or at least not leave it to the devs to MANUALLY handle synchronisation...

I think you are misunderstanding the definition of "synchronization" as used in programming.

"Synchronization" means the correct order of variable access by several competing processes, not passing data from one processor to another as you seem to understand it.

Suppose you have two threads running in parallel, the only mean of interthread communication is shared variable. While it is not important which thread is being executed first, the order of shared variable access is important to ensure correct calculation output. To ensure the correct order of shared variable access, a signal variable is used to determine which thread is permitted to access the variable.

Suppose you have thread0 and thread1 sharing a variable X and a signal S. When S reads 0, thread0 accesses the variable X, while thread1 gets its hand on X when the S reads 1. As long as S read 0, thread1 will interrupt itself into wait state while thread0 proceeds to read X. After thread0 is done with X, then it will reset S to 1 and interrupt itself. Now it is thread1 that gets the exclusive access to X while thread0 keeps interrupting itself.

Confusing??? Fortunately, this is the parrallel processing done in SMT way and not how CELL functions. In message passing, there are numorous processes running on various machines, each forming server/client relationship between them. While the IE/Apache forms one server/many clients relationship, the reverse is true in message passing, there are many servers for each client. When client encounters certain functions that must be performed on the server, it calls the server via message passing and interrupts itself into wait state. The server receives the message, process it, then return the result back to the client, upon which the self-imposed interrrupt is resolved and the client proceeds with next instruction. Synchronization is not an issue with message passing because the client interrupts inself while the call is processed on the server and there is no variable sharing between client and server processes. Yes, message passing is indeed much cleaner than threading, but developers are still forced to manually distribute functions across several processes and interconnect them via pipes to make the best use of technology.

DeadmeatGA · Aug 23, 2003

...

Real world performance figures.

PSX : 90~120K textured polygons/s
N64 : 100K textured polygons/s
DC : 3~4 million polys/s
PSX2 : 3~8 million polys/s

Megadrive1988 · Aug 23, 2003

I would say this:

PSX: 120K~180K textured, g-shaded, lit pps
N64: 100K~160K textured pps w/ most features
DC: 3~5M pps w/ most features
PS2 5~10M pps w/ most features

Paul · Aug 23, 2003

PSX2 : 3~8 million polys/s

There are PS2 games that push more than 15 million in excess of 20.[/quote]

cthellis42 · Aug 23, 2003

Yeah, but that must just be in some FANTASY world! After all, I don't have any PS2 games that push those kind of numbers, so they must not exist!

Paul · Aug 23, 2003

Fantesy w0rld? wtf u talk bout n00b. are you hacking teh matrix???//

u a h4xorz?

Berserk · Aug 23, 2003

cthellis42 said:
Yeah, but that must just be in some FANTASY world! After all, I don't have any PS2 games that push those kind of numbers, so they must not exist!

I'm pretty sure Jak&Daxter, Jak 2, R&C 1&2, Burnout, Burnout 2,.... uses more than 10 - 15 million polygons per second.

cthellis42 · Aug 23, 2003

Sarcasm much?

Paul seemed to catch where I was coming from. Hehe...

I know perfectly well that there are many games in that range (and some with mutterings of numbers in the 18-20 range), which is why I am left to only respond with dripping sarcasm and satire those who undercut the machine by severe degrees just because they feel they MUST put it down. I know perfectly well most games do NOT push those numbers (on any platform), but that means dick-all when talking about capabilities. I also know perfectly well that achieving those numbers does not necessarily mean stability at those numbers at all times, and that it's REALLY hard to find all the functional information we would like.

Lastly, I know that as far as anything Sony-related is concerned, Deadmeat has the functional arguing capacity of a big pile of his namesake.

nAo · Aug 23, 2003

Berserk said:
I'm pretty sure Jak&Daxter, Jak 2, R&C 1&2, Burnout, Burnout 2,.... uses more than 10 - 15 million polygons per second.

and I'm pretty sure after a PA session with those games you will not be so sure anymore

MrWibble · Aug 23, 2003

nAo said:
Berserk said:

I'm pretty sure Jak&Daxter, Jak 2, R&C 1&2, Burnout, Burnout 2,.... uses more than 10 - 15 million polygons per second.

Click to expand...

and I'm pretty sure after a PA session with those games you will not be so sure anymore

If you run J&D or R&C through a PA and look at the whole frame-time then you won't see those massive figures because mostly the game runs well inside a frame, and not all scenes are efficiently tesselated. In many scenes however, if you ignore the idle-time and just look at the portion of the frame doing drawing, it'll easily hit 18M polys or so.

If you took a theoretical figure or an internal timer, as many games had to do before the PA, and as anyone has to do on another platform where a logic analyser isn't available, you'll see higher numbers - because they won't take into account all the subtle stalls and stuff that happen in a real situation.

It might be considered fairer to take the whole frame, idle time and all, and use that to calculate the rate, but I doubt very much anyone is being that fair on other platforms (it's not a useful number to measure) so you won't be comparing like with like.

Comparing *any* other source to PA data is dangerously flawed - as I've said, even just on PS2, PA data usually indicates a lower rate than you'd expect compared to internal instrumentation.

Even so, I bet J+D etc will be in double figures of millions much of the time, which is more than DC was/is capable of even in theoretical terms.

Even average PS2 titles will quite often hit the numbers a flat-out top DC title managed. And I'd expect a significant gap between those and an "average" DC title.

Anyone pretending otherwise really is clutching at straws.

I just roll my eyes every time I see someone pluck performance data out of thin air for some platform and then compare it to random PA figures from PS2.

nAo · Aug 23, 2003

MrWibble said:
If you run J&D or R&C through a PA and look at the whole frame-time then you won't see those massive figures because mostly the game runs well inside a frame, and not all scenes are efficiently tesselated. In many scenes however, if you ignore the idle-time and just look at the portion of the frame doing drawing, it'll easily hit 18M polys or so.

I obviously didn't look at the whole frame, but at the whole time VU1 is working and I assumed this time is devoted to rendering (and it seems so via the gif packet viewer).
I took several measures in differents level and zones and maximum poly/s I observed was a 'merely' 14 Mpoly/s reached by R&D..where the average is around the 8/9 MPolys/s.
Maybe I stressed the engine too much in the frames I analyzed

To be fair I cannot run J&D on my PA, but I believe it should be quite as good as R&C, as the share the same tesselation technology, and most of the stuff in the scene is rasterized with that tech.

ciao,
Marco

MrWibble · Aug 23, 2003

nAo said:
MrWibble said:

If you run J&D or R&C through a PA and look at the whole frame-time then you won't see those massive figures because mostly the game runs well inside a frame, and not all scenes are efficiently tesselated. In many scenes however, if you ignore the idle-time and just look at the portion of the frame doing drawing, it'll easily hit 18M polys or so.

Click to expand...

I obviously didn't look at the whole frame, but at the whole time VU1 is working and I assumed this time is devoted to rendering (and it seems so via the gif packet viewer).
I took several measures in differents level and zones and maximum poly/s I observed was a 'merely' 14 Mpoly/s reached by R&D..where the average is around the 8/9 MPolys/s.
Maybe I stressed the engine too much in the frames I analyzed
To be fair I cannot run J&D on my PA, but I believe it should be quite as good as R&C, as the share the same tesselation technology, and most of the stuff in the scene is rasterized with that tech.

ciao,
Marco

It always looked to me that R&C did more work (morphing and stuff) than J&D, and thus gets lower polygon numbers overall - though I think it looks better as a result. Still, 14M is a pretty decent number, certainly well over the "Real world performance figures" proposed by DM.

J&D definitely did 18M in some scenes when I tested it. In fact I'm sure there's a presentation from a Sony guy from last years GDC-E where he showed a scan of a game doing a good number of polys. I'd put money on that scan being J&D

But either way DM is talking out of his rear...

ChryZ · Aug 25, 2003

MrWibble said:
J&D definitely did 18M in some scenes when I tested it. In fact I'm sure there's a presentation from a Sony guy from last years GDC-E where he showed a scan of a game doing a good number of polys. I'd put money on that scan being J&D

I think, it was this one ... GDC2003_Intro_Performance_Analyzer_18Mar03.pdf

MrWibble · Aug 25, 2003

ChryZ said:
MrWibble said:

J&D definitely did 18M in some scenes when I tested it. In fact I'm sure there's a presentation from a Sony guy from last years GDC-E where he showed a scan of a game doing a good number of polys. I'd put money on that scan being J&D

Click to expand...

I think, it was this one ... <a href="http://www.research.scea.com/research/pdfs/GDC2003_Intro_Performance_Analyzer_18Mar03.pdf">GDC2003_Intro_Performance_Analyzer_18Mar03.pdf</a>

Actually I was thinking of an earlier talk by some European dude. However thats a pretty good one too, and it actually illustrates what I was saying about calculating the performance rate over the whole frame including the idle time, giving a much lower rate than you'd see if you just looked at the actual busy period.

Yeah, the "optimal" scan they use is almost certainly R&C - especially if you look at the packet viewer samples later on, where they clearly have some R&C artwork... (and look at the number of polys in the scene - no wonder it's running slightly slower than J&D!)

They're claiming only 7M polys in the scene they've chosen, but then it looks to be just the first scene in the game, which almost certainly isn't the busiest, and they're including the idle time - which is probably a good proprortion of the frame.

It's probably chewing through the scene at almost double that when it's actually busy (from looking at the scan they show, and having played with this particular title myself... ahem)

The skinned stuff, and the morphing stuff, probably comes through slower than the static bits. I reckon that explains the lower rate in R&C over J&D.

cthellis42 · Aug 25, 2003

ChryZ said:
I think, it was this one ...

That is one highly interesting pdf, thanks! ^_^ Unless I skipped over it, though, I didn't notice anything much about J&D nor ~18 million polys. But it doesn't make the link any less interesting.

ERP · Aug 25, 2003

Paul said:
PSX2 : 3~8 million polys/s

Click to expand...

There are PS2 games that push more than 15 million in excess of 20.

[/quote]

You should try running some of the games that claim these numbers through a performance analyser, you'd be surprised at the results.

MrWibble · Aug 26, 2003

ERP said:
Paul said:

PSX2 : 3~8 million polys/s

Click to expand...

There are PS2 games that push more than 15 million in excess of 20.

Click to expand...

You should try running some of the games that claim these numbers through a performance analyser, you'd be surprised at the results.[/quote]

Yeah. It's a pity there isn't a PA for other systems to expose the wild exagerations (or just over optimistic profiling) that goes on for them too...

I suspect that most of the people making bold claims would be equally surprised by PA data.

ERP · Aug 26, 2003

Yeah. It's a pity there isn't a PA for other systems to expose the wild exagerations (or just over optimistic profiling) that goes on for them too...

I suspect that most of the people making bold claims would be equally surprised by PA data.

You an get stat's directly from the GPU on Xbox, including Pixels drawn and triangles processed. You don't get any nice bus usage stats but those are less useful in this context. GC has similar built in performance counters.

And FWIW we weren't at all surprised by our PA results. There is some question as to what is classified as a triangle by the PA (i.e. at what point in the pipeline it's measuring), but from a dev standpoint the total tri count isn't really useful data.

MiniFAQ : How CELL works.

V3

Megadrive1988

DeadmeatGA

DeadmeatGA

Megadrive1988

Paul

cthellis42

Hoopy Frood

Paul

Berserk

cthellis42

Hoopy Frood

nAo

Nutella Nutellae

MrWibble

nAo

Nutella Nutellae

MrWibble

ChryZ

MrWibble

cthellis42

Hoopy Frood

ERP

MrWibble

ERP

Similar threads