some PlayStation 3 performance guess/estimates(wild guesses)

london-boy said:
i think there will be a decently powerful middleware solution available to anyone wanting to develop a game for it, altough some of the biggest developers with bigger budgets and more time in their hands will no doubt develop their own engines...

The management of the data flow to and from all those VUs should be interesting. I sure would like to know how they plan to achieve that.
 
fbg1 said:
london-boy said:
i think there will be a decently powerful middleware solution available to anyone wanting to develop a game for it, altough some of the biggest developers with bigger budgets and more time in their hands will no doubt develop their own engines...

The management of the data flow to and from all those VUs should be interesting. I sure would like to know how they plan to achieve that.


WELL there are 2 options here, that:

1) (very unlikely) Sony goes the PS2 route completely F**king the initial documentation up, in which case early ps3 titles will look like complete trash compared to what the same hardware will put out in its 3rd or 4th generation

2) (very likely) Sony, with the help of IBM, develops a nice set of libraries and development tools to make early code look decent unitl the developers have enough expertise to experiment with the hardware later in PS3 life...

3) Someone decides to nuke us all, in which case we will all be dead by the time PS3 comes out.

4) An alien race comes to earth, setting up an alliance with the human race and sharing technology, in which case ps3 will probably be a quantum computer with the next generation of Blue Ray and not need for wires, even electrical, thanx to some nuclear fusion battery inside the thing. all at 20 quid.

5) An alien race comes to earth, deciding to destroy the simple minded Human Race and pulverise us all.

now u choose...
 
The management of the data flow to and from all those VUs should be interesting. I sure would like to know how they plan to achieve that.

With 1024bit bus, what data flow is there to manage ?
 
london-boy said:
now u choose...

heh. let me guess... #2?

Actually, I assumed that with IBM doing the design work, they'd have corresponding libraries as well. I doubt the PS3 launch will suffer in that way as PS2 did.
 
fbg1 said:
london-boy said:
now u choose...

heh. let me guess... #2?

Actually, I assumed that with IBM doing the design work, they'd have corresponding libraries as well. I doubt the PS3 launch will suffer in that way as PS2 did.



yeah and i already heard something about libraries or middlewares or something like that...

everyone learns from their mistakes, i'm sure Sony learned from theirs. they are also in a pretty good position, where pretty much every mistake they make is forgiven given the HUGE user base and brand recognition they have...

also, it would be interesting to see Sony with the best hardware for once... not that it would change anything if they don't... it never did
 
V3 said:
With 1024bit bus, what data flow is there to manage ?

Dataflow isn't the right word. What I mean is, the Cell will be doing a lot of parallel computing. Massively parallel computing is fine for running scientific computations, but for real-time games??? Synchronizing and optimizing that parallelization poses an interesting problem. I wonder how it will be managed.
 
Massively parallel computing is fine for running scientific computations, but for real-time games??? Synchronizing and optimizing that parallelization poses an interesting problem. I wonder how it will be managed.

They already patent their solution, you can read it up on the patent.

This is something that they design from ground up to be massively parallel for real time purposes, not just putting massive amount of Intel CPUs together.

Simple, you need to make sure that bus is always filled... then you start worrying about filling it with useful data...

:) If it is not filled you don't need alot of managing, you only need to start managing when its full.
 
Parallelization is not a hardware problem, it isnt even a compilation problem. At those levels the solutions are quite straightforward, there are already a lot of giants on whose shoulders to stand, and the people working on them have a good grasp of the issues. The problem is software development, or more specifically software developers.
 
Simple, you need to make sure that bus is always filled... then you start worrying about filling it with useful data...

:) If it is not filled you don't need alot of managing, you only need to start managing when its full.

Uhm no... if you do not manage well your crop, by working on the land, planting the seeds, irrigating it, etc... ( which is managing things ;) ) you will not have much crop to fill the big sylos...

You do need to manage nicely, each DMAC can address a minimum of 1 Kbits at a time, that is not a small chunk... you do not want to transfer 10 bytes of data each cycle innthat 1,024 bits bus ;)
 
Uhm no... if you do not manage well your crop, by working on the land, planting the seeds, irrigating it, etc... ( which is managing things ) you will not have much crop to fill the big sylos...

Planting the seeds, irrigating, etc is more like creating your artworks, engine, sfx and musics. You do need to manage this, if not your game won't be finished on time ;)

If your silo is not filled to the max, you can pretty much put your harvest in there, but if it is full, than you would need to take some of the earlier/later part and look elsewhere.

Its like having an over zealous traffic controller, on a quite intersection or fast highway.
 
only if several trucks leave at the same time ;)

ok.. let's nd harvesting 101 here :LOL:

The thing abotu wide busses is that if you use them and read/write small chunks you are wasting the bus's efficiency...

If you read the patent, the smallest addressable unit in the e-DRAM is 1,024 bits and that would mean that for each transfer we use the full bus, but if we are not packaging our data nicely we will waste potential bandiwdth...

Also you do not want to take bandwidth TOTALLY for granted either and try to push too much stuff through the big 1,024 bits bus...

We need to manage how data is loaded ( remember we have external memory and an optical disc to read/write data from/to... ) we need to pay attention to propr data flow...

e-DRAM size is limited and all the execution is done from the GPRs and the Local Storage SRAM in each APU ( 128 KB per APU ) and as I said we have to pay attention to bottlenecks elsewhere ( where the busses are thinner... )...
 
Panajev,

I believe it has been stated already that the 1kbit main bus of the Cell is partitioned up like a crossbar controller, and there is most likely a bus arbiter that helps with keeping the various "lanes" of that highway filled by doing queuing of requests in an out-of-order fashion, load-balancing etc. It's not likely Sony would just slap down a dumb 1kbit connection in the chip and then let efficiency go down the tubes unless each transfer is an even 1kbit multiple.

*G*
 
you are right... it is not 1,024 bits it is 512 bits ;) ( well in one scenario it is only 512 bits... if we interleave the memory blocks over different memory banks and we store 512 bits in block).

From the patent:

[0081] FIG. 12A illustrates the control system and structure for the DRAM of a BE. A similar control system and structure is employed in processors having other sizes and containing more or less PEs. As shown in this figure, a cross-bar switch connects each DMAC 1210 of the four PEs comprising BE 1201 to eight bank controls 1206. Each bank control 1206 controls eight banks 1208 (only four are shown in the figure) of DRAM 1204. DRAM 1204, therefore, comprises a total of sixty-four banks. In a preferred embodiment, DRAM 1204 has a capacity of 64 megabytes, and each bank has a capacity of 1 megabyte. The smallest addressable unit within each bank, in this preferred embodiment, is a block of 1024 bits.

[0082] BE 1201 also includes switch unit 1212. Switch unit 1212 enables other APUs on BEs closely coupled to BE 1201 to access DRAM 1204. A second BE, therefore, can be closely coupled to a first BE, and each APU of each BE can address twice the number of memory locations normally accessible to an APU. The direct reading or writing of data from or to the DRAM of a first BE from or to the DRAM of a second BE can occur through a switch unit such as switch unit 1212.

[0083] For example, as shown in FIG. 12B, to accomplish such writing, the APU of a first BE, e.g., APU 1220 of BE 1222, issues a write command to a memory location of a DRAM of a second BE, e.g., DRAM 1228 of BE 1226 (rather than, as in the usual case, to DRAM 1224 of BE 1222). DMAC 1230 of BE 1222 sends the write command through cross-bar switch 1221 to bank control 1234, and bank control 1234 transmits the command to an external port 1232 connected to bank control 1234. DMAC 1238 of BE 1226 receives the write command and transfers this command to switch unit 1240 of BE 1226. Switch unit 1240 identifies the DRAM address contained in the write command and sends the data for storage in this address through bank control 1242 of BE 1226 to bank 1244 of DRAM 1228. Switch unit 1240, therefore, enables both DRAM 1224 and DRAM 1228 to function as a single memory space for the APUs of BE 1222.

[0084] FIG. 13 shows the configuration of the sixty-four banks of a DRAM. These banks are arranged into eight rows, namely, rows 1302, 1304, 1306, 1308, 1310, 1312, 1314 and 1316 and eight columns, namely, columns 1320, 1322, 1324, 1326, 1328, 1330, 1332 and 1334. Each row is controlled by a bank controller. Each bank controller, therefore, controls eight megabytes of memory.

The cross-bar switch is there because we have 4 PEs accessing the same e-DRAM inside the Broadband Engine... we have 4 DMACs, 1 in each PE ( we have several banks and several banks controllers ).

[0081] FIG. 12A illustrates the control system and structure for the DRAM of a BE. A similar control system and structure is employed in processors having other sizes and containing more or less PEs. As shown in this figure, a cross-bar switch connects each DMAC 1210 of the four PEs comprising BE 1201 to eight bank controls 1206. Each bank control 1206 controls eight banks 1208 (only four are shown in the figure) of DRAM 1204. DRAM 1204, therefore, comprises a total of sixty-four banks. In a preferred embodiment, DRAM 1204 has a capacity of 64 megabytes, and each bank has a capacity of 1 megabyte. The smallest addressable unit within each bank, in this preferred embodiment, is a block of 1024 bits.


The bus that feeds the APU is 1,024 bits wide as you cna read in the patent as well:

[0070] APU 402 further includes bus 404 for transmitting applications and data to and from the APU. In a preferred embodiment, this bus is 1,024 bits wide. APU 402 further includes internal busses 408, 420 and 418. In a preferred embodiment, bus 408 has a width of 256 bits and provides communications between local memory 406 and registers 410. Busses 420 and 418 provide communications between, respectively, registers 410 and floating point units 412, and registers 410 and integer units 414. In a preferred embodiment, the width of busses 418 and 420 from registers 410 to the floating point or integer units is 384 bits, and the width of busses 418 and 420 from the floating point or integer units to registers 410 is 128 bits. The larger width of these busses from registers 410 to the floating point or integer units than from these units to registers 410 accommodates the larger data flow from registers 410 during processing. A maximum of three words are needed for each calculation. The result of each calculation, however, normally is only one word.
 
Back
Top