PS3 distributed computing without internet limitations ques.

PC-Engine · Jun 3, 2004

Shared memory with no snoop controller and local pools are also really scary things

ARM and NEC agrees with you.

http://www.linuxdevices.com/news/NS3610443018.html

MfA · Jun 3, 2004

Snooping is like superscalar processing ... good hardware for stupid software.

PC-Engine · Jun 3, 2004

MfA said:
Snooping is like superscalar processing ... good hardware for stupid software.

Yes and it allows the game developer to focus more time and resources toward creating a good game instead of pulling their hair trying to find ways to get simple stuff to work.

MfA · Jun 3, 2004

Mostly it just allows them to stick to what they know, unfortunately what they know is how not to create parallel software reliably.

The only data in memory which really needs the snooping are things like mutexes, the lack of more dedicated hardware support for such mechanisms and the ability to tie them regions of memory forces the requirement of strict ordering and cache coherency for each individual memory access.

If we are going to use area efficient cores sticking to a non scalable method of maintaining coherency makes little sense. Requiring 2 seperate methods, one for local SMP type nodes and one for clustering those nodes, adds more complexity than just replacing snooping with something scalable.

PC-Engine · Jun 3, 2004

Why wouldn't it be able to scale? I don't see it being that complex at all not to mention overall efficiency would probably be a lot higher.

Ty · Jun 3, 2004

MfA said:
Snooping is like superscalar processing ... good hardware for stupid software.

This is the funniest line I've read in awhile. Tks!

DeanoC · Jun 3, 2004

Megadrive1988 said:
so basicly, realtime distributed computing for rendering just is not going to happen for time-critical highly detailed action games. anymore than raytracing was going to happen on Ultra 64.

More correctly its, it would take a paradigm shift to get it working. One thing for sure is that most games won't get speed increases by plugging in a couple of PS3 and a Cell TV.

Well at least in the early days

If Cell does work, then its possible that in a few years after release some titles might start to re-architect to work like this. But you can say good-bye to cross-platform games.

Multi-processing is going to be as big a paradigm shift for game devs as ASM to C was back at the 16 to 32 bit transition. Distrubuted multi-processing is another shift just as difficult again.

Squeak · Jun 3, 2004

Panajev2001a said:
If you had three black-boxes and each processed at the same throughput of 100 MVertices/s at 10 bytes per vertex we are in trouble if we expect to transfer the stream of data from one box to the next.

A bit of on a tangent maybe, but how many bytes does an average vertice take up in a modern videogame? Sometimes it's quoted at 40 bytes, other sources say as low as 8 bytes (sounds very low to me).

phed · Jun 3, 2004

I'm pretty naive on this because I always think from a spatial/visual coderperspective but anyway.

Hm, how is IO done in PS3? I guess it is memory mapped, so an cell could just write "read stuff from internet into here" and the cell could somehow be notified when memory is ready.

One simple scenario is a cell that contain a decoder for some compressed video that a DVD-reader sent to the TV (to limit bandwidth).

Which I think is key; being able to send apulets means you can decide the format yourself. If you need only to render lines? Then send transformed 2d-vertices to the linerenderer.

But I expect the interconnects to be at least firewire.

Gubbi · Jun 3, 2004

DeanoC said:
Multi-processing is going to be as big a paradigm shift for game devs as ASM to C was back at the 16 to 32 bit transition. Distrubuted multi-processing is another shift just as difficult again.

Not only that, but the difference in architecture between next generation console, all of which are "multi-processing", is huge.

The programming model for CELL is still undisclosed, but from the patents the most probable solution is a CSP style message passing model (transputers anyone?).

The one big problem with CELL is going to be how to slice and dice your software so that it can run on a given number of PEs/APUs. The number of APUs in a system is fixed as is the amount of local storage attached to each APU, AFAIK no virtualization is implemented (neither PE/APUs or storage). So programs and data has to be squeezed into packets with a fixed (small) upper size.

Another problem is the memory model. Some data dices into fixed size chunks very well. Other data don't, any kind of tree structure or sparse matrix structure is going to be a performance killer on CELL. The individual APUs can only communicate with memory through DMA (and only to the on-die EDRAM), this means that it will stall on every memory dependency. In traditional in-order scheduling CPUs you could circumvent the stall penalty by doing multiple searches simultaneously (this explicit multi-threading was proposed by Mfa last time we discussed this). However it is unclear (and unlikely) that each APU can do overlapping DMA transfers. Most likely you'll have to set DMA up, wait for it to complete, the set the next one up, -in other words pay the full latency penalty on every dependency. Another problem is that only the PEs can set up main memory transfers, which means that either you have to 1.) load up entire dataset in EDRAM before the APUs goes to work on it. Or 2.) suffer alot of APU<->PE communication.

The one huge advantage CELL has, is that it puts lots and lots and lots of execution units on a die with very little control logic overhead. And PS3 developers are sure to find ways to use these.

In contrast to this Microsoft has opted for the true and tested approach. While also multi-processing XBox Next is going to have a traditional proces/thread programming model with shared demand loaded caches. So while having a substantially lower amount of execution units it is likely to have alot higher utilization of these units.

What is guaranteed is that there will be alot of apples to oranges comparisons of these two systems

Cheers
Gubbi

Entropy · Jun 3, 2004

Gubbi said:
So while having a substantially lower amount of execution units it is likely to have alot higher utilization of these units.

Fewer resources => higher utilization is pretty much a natural law. Or at least Amdahls.

However if we calculate (work/die area) as our utilization per resource quota, the BBE may well have the advantage. At least, that's the general idea with the Cell approach in the first place. Does seem reasonable, particularly as the number of integrated processors per die grow to 16 and beyond.

What is guaranteed is that there will be alot of apples to oranges comparisons of these two systems

Truer words were rarely spoken.
I hope this thread has effectively punctured high-flying hopes of super-gaming by hooking up several PS3s to each other. DeanoC nailed it in his first post. Data locality is the killer. That doesn't mean that there aren't instances where distributed computing is possible. There are Beowulf clusters all over the place, and the Cell concept definitely has greater promise. I come from computational science, and have almost no interest whatsoever in the PS3 as a gaming device. But no number cruncher worth his salt could see that chip architecture and not go "Hmmmm. Interesting".

London Geezer · Jun 3, 2004

Entropy said:
I come from computational science, and have almost no interest whatsoever in the PS3 as a gaming device. But no number cruncher worth his salt could see that chip architecture and not go "Hmmmm. Interesting".

That's the thing. It has to be seen how all these "numbers" can ultimately be used in a game... Because in the end, PS3 WILL be a gaming device first and foremost.

Entropy · Jun 3, 2004

london-boy said:
Entropy said:

I come from computational science, and have almost no interest whatsoever in the PS3 as a gaming device. But no number cruncher worth his salt could see that chip architecture and not go "Hmmmm. Interesting".

Click to expand...

That's the thing. It has to be seen how all these "numbers" can ultimately be used in a game... Because in the end, PS3 WILL be a gaming device first and foremost.

Of course.
But the Cell concept and suitable chip implementations are not limited to the PS3. While the gaming focus of this forum is quite appropriate, it is still important to remember that. I'd say that the BBE is the first stab at implementing an architecture suitable for very high levels of parallellism in a consumer level computing device intended for fairly general purpose applicability and for production in high volumes. Put another way, it is the first mainstream challenge we've seen to the classical computer architectures. It's an architecturally smaller step from the dear old Univac mainframe (where I did my first real programming) to a modern desktop PC, than from that PC to the PS3.

Exciting stuff, not just for gamers.

London Geezer · Jun 3, 2004

I agree, but sadly, until Cell (or any other architecture, including PowerPC etc) runs Windows (and it's not gonna happen anytime soon), PCs will stay there as they are for a LOOOOOOOOOOOOOOOONG time...

Entropy · Jun 3, 2004

london-boy said:
I agree, but sadly, until Cell (or any other architecture, including PowerPC etc) runs Windows (and it's not gonna happen anytime soon), PCs will stay there as they are for a LOOOOOOOOOOOOOOOONG time...

Exactly.
Which is precisely why this is so important IMO.
It circumvents the architectural inertia of the Wintel hegemony, but still targets fairly general applicability and high volumes.

MfA · Jun 3, 2004

PC-Engine said:
Why wouldn't it be able to scale? I don't see it being that complex at all not to mention overall efficiency would probably be a lot higher.

Because of bandwith and latency concerns mostly, this sums up some of the problems and proposes an alternative solution to the traditional directory based approaches (of which I only really like reactive-NUMA).

nAo · Jun 3, 2004

Squeak said:
A bit of on a tangent maybe, but how many bytes does an average vertice take up in a modern videogame? Sometimes it's quoted at 40 bytes, other sources say as low as 8 bytes (sounds very low to me).

It depends upon implementation.
On the PS2 my biggest vertex has 15 bytes size and it has position, normal, a UV mapping, and a couple of vertex colors (can be interpolated between day time and night time..)
In my case all datas are quantized and compressed.
A non compressed vertex can buy a lot of space...It's easy to have 40 bytes vertices..

London Geezer · Jun 3, 2004

Entropy said:
london-boy said:

I agree, but sadly, until Cell (or any other architecture, including PowerPC etc) runs Windows (and it's not gonna happen anytime soon), PCs will stay there as they are for a LOOOOOOOOOOOOOOOONG time...

Click to expand...

Exactly.
Which is precisely why this is such an important event.
It circumvents the inertia of the Wintel hegemony.

Does it?
Until Mr Joe Black can choose between an Intel, Amd AND SonyCell CPU for his brand spanking new PC, until he can choose between Nvidia, Ati and SonyCell for the GPU part, nothing's gonna get circumvented. Sadly.

Entropy · Jun 3, 2004

london-boy said:
Entropy said:

london-boy said:

I agree, but sadly, until Cell (or any other architecture, including PowerPC etc) runs Windows (and it's not gonna happen anytime soon), PCs will stay there as they are for a LOOOOOOOOOOOOOOOONG time...

Click to expand...

Exactly.
Which is precisely why this is such an important event.
It circumvents the inertia of the Wintel hegemony.

Click to expand...

Does it?
Until Mr Joe Black can choose between an Intel, Amd AND SonyCell CPU for his brand spanking new PC, until he can choose between Nvidia, Ati and SonyCell for the GPU part, nothing's gonna get circumvented. Sadly.

Not in Windows PC-space, no, not likely.
But that's kind of the point.

Fafalada · Jun 3, 2004

Squeak said:
A bit of on a tangent maybe, but how many bytes does an average vertice take up in a modern videogame? Sometimes it's quoted at 40 bytes, other sources say as low as 8 bytes (sounds very low to me).

Like nAo says, it's up to implementation.
Smallest vertices with 3 or more vertex attributes that we use are 7bytes/vertex, and that's still all done with the "trivial" compression (scalar/vector quantization, sometimes with delta offsets).
Anyway once you start going to more exotic schemes (not feasible with current hw, but nexgen, who knows) you go down to a couple of bits per vertex, and lower...

PS3 distributed computing without internet limitations ques.

PC-Engine

MfA

PC-Engine

MfA

PC-Engine

Ty

Roberta E. Lee

DeanoC

Trust me, I'm a renderer person!

Squeak

phed

Gubbi

Entropy

London Geezer

Entropy

London Geezer

Entropy

MfA

nAo

Nutella Nutellae

London Geezer

Entropy

Fafalada

Similar threads