The PSP CPU is not based on EE or GS or PSOne CPU: link...

I'm with MfA on this.

The architectural overhead in Cell is obvious. Yes, cell is scalable, but this scalability comes at a cost. The trappings as MfA references them.

Scalability isn't free, it's a goal and thus requires trade-offs.

Cell is a generalised solution with a fair bit of tweaking headroom, but a custom solution will likely whip the pants of it since it won't make the trade-offs that cell has to make.
 
Saem said:
I'm with MfA on this.

The architectural overhead in Cell is obvious. Yes, cell is scalable, but this scalability comes at a cost. The trappings as MfA references them.

Scalability isn't free, it's a goal and thus requires trade-offs.

Cell is a generalised solution with a fair bit of tweaking headroom, but a custom solution will likely whip the pants of it since it won't make the trade-offs that cell has to make.

Overall power is not the problem, but still the fact is that an architecture naturaly geared to Vector processing and with high bandwidth ( thanks to the e-DRAM ) will perform very well on multimedia applications ( including 3D graphics )...

And the trade-offs are worthy since the major goal is having more and more Cell based devices that cna easily inter-operate with each other... I think the big picture for Cell is worth the costs you mention...

A single PE solution would not be horrible...

1-2 APUs at 150 MHz would mean 1.2-2.4 GFLOPS which is Dreamcast level or above ( 2 APU solution while 1 APU gets 1.2 GFLOPS and 1.2 GOPS ) and the APUs do not have to take that much space... having Local Storage, a good amoutn of registers and e-DRAM should allow the efficiency not to fall down too much...

In addition to that we have a lean and compact RISC processor acting as PUm a DMAC, a Pixel Engine + Image Cache + Display Controller ( in the more PlayStation 3 like configuration these three things took the space of 4 APUs, but considering this would be a PDA like configuration with lower resolution and memory requirements I think the size of the Image Cache would be lower and the complexity of the Pixel Engine and the Display Controller would be reduced as well )... we already knew we had e-DRAM...

That doesn't sound too bad, plus you would share tools with PlayStation 3 ( even if in a single PE, single APU configuration you would have much less trouble synchronizing the Hardware resources )... you get a Cell compatible device with the power necessary to push out quite nice 3D graphics and sound... I think the trade-off is worth the effort...

Also this custom solution will not be able to network with other Cell deivces as easily and will loose efficiency running complex abstraction layers...
 
I think what you can reasonably put in is 1 fast core (200-300 MHz) with SIMD extensions and a basic 3D chip (dont go expecting pixel shaders, and certainly dont expect floating point fragment processing ... hardware T&L maybe). You just arent going to come up with a Cell based design which can go that low in power requirements.

The whole idea of a processing element with a coordinating processor with "worker" units is rather silly with only 1 unit.
 
MfA said:
The whole idea of a processing element with a coordinating processor with "worker" units is rather silly with only 1 unit.

The coordinating processor would run the game logic too of course while the "worker" unit does the T&L + any physics a handheld game might need (which wouldn't be that much). Wouldn't be inefficient or wasteful...

*G*
 
Grall said:
MfA said:
The whole idea of a processing element with a coordinating processor with "worker" units is rather silly with only 1 unit.

The coordinating processor would run the game logic too of course while the "worker" unit does the T&L + any physics a handheld game might need (which wouldn't be that much). Wouldn't be inefficient or wasteful...

*G*

The idea might appear silly, but I honestly do not see why... Would it be much better if you had two APUs ( to me it would not change by much [still one could be for T&L and Physics and the other would be for Pixel Programs, but I do not see two APUs], Grall makes a good point, you still need a processor handling the OS and the game logic and the APU is not set to do that by itself ) ?

I want to quote the patent once more...

In accordance with the present invention, all members of a computer network, i.e., all computers and computing devices of the network, are constructed from a common computing module. This common computing module has a consistent structure and preferably employs the same ISA. The members of the network can be, e.g., clients, servers, PCs, mobile computers, game machines, PDAs, set top boxes, appliances, digital televisions and other devices using computer processors. The consistent modular structure enables efficient, high speed processing of applications and data by the network's members and the rapid transmission of applications and data over the network. This structure also simplifies the building of members of the network of various sizes and processing power and the preparation of applications for processing by these members.

Silly, but the ideas behind Cell are clear...

The PU is not only needed if there is only a certain number of APUs, the PU is needed to have the APUs start processing of data... the APUs are workers, they need the boss to direct them... whether you have an employee or 8 you still need to give orders, you still need a boss...

[0018] In another aspect, the present invention provides a system and method for the PUs' issuance of commands to the APUs to initiate the APUs' processing of applications and data. These commands, called APU remote procedure calls (ARPCs) [...]

The PU is also responsible of the management of the memory sandboxes...

[0113] The PU of a PE controls the sandboxes assigned to the APUs. Since the PU normally operates only trusted programs, such as an operating system, this scheme does not jeopardize security. In accordance with this scheme, the PU builds and maintains a key control table.

Let me quote another part of the patent ( this is not the final thing maybe, but it displays the intentions behind the project )...

[0057] The computers and computing devices connected to network 104 (the network's "members") include, e.g., client computers 106, server computers 108, personal digital assistants (PDAs) 110, digital television (DTV) 112 and other wired or wireless computers and computing devices. The processors employed by the members of network 104 are constructed from the same common computing module. These processors also preferably all have the same ISA and perform processing in accordance with the same instruction set. The number of modules included within any particular processor depends upon the processing power required by that processor.

[0058] For example, since servers 108 of system 101 perform more processing of data and applications than clients 106, servers 108 contain more computing modules than clients 106. PDAs 110, on the other hand, perform the least amount of processing. PDAs 110, therefore, contain the smallest number of computing modules. DTV 112 performs a level of processing between that of clients 106 and servers 108. DTV 112, therefore, contains a number of computing modules between that of clients 106 and servers 108. As discussed below, each computing module contains a processing controller and a plurality of identical processing units for performing parallel processing of the data and applications transmitted over network 104.

[0059] This homogeneous configuration for system 101 facilitates adaptability, processing speed and processing efficiency. Because each member of system 101 performs processing using one or more (or some fraction) of the same computing module, the particular computer or computing device performing the actual processing of data and applications is unimportant. The processing of a particular application and data, moreover, can be shared among the network's members. By uniquely identifying the cells comprising the data and applications processed by system 101 throughout the system, the processing results can be transmitted to the computer or computing device requesting the processing regardless of where this processing occurred. Because the modules performing this processing have a common structure and employ a common ISA, the computational burdens of an added layer of software to achieve compatibility among the processors is avoided. This architecture and programming model facilitates the processing speed necessary to execute, e.g., real-time, multimedia applications.

I have highlighted some interesting parts...

Another quote ( sorry Marco for all this quoting, I know you know it ad nauseam, I just wanted to build my case for other people who read this )...

[0065] PU 203 can be, e.g., a standard processor capable of stand-alone processing of data and applications. In operation, PU 203 schedules and orchestrates the processing of data and applications by the APUs. The APUs preferably are single instruction, multiple data (SIMD) processors. Under the control of PU 203, the APUs perform the processing of these data and applications in a parallel and independent manner. DMAC 205 controls accesses by PU 203 and the APUs to the data and applications stored in the shared DRAM 225. Although PE 201 preferably includes eight APUs, a greater or lesser number of APUs can be employed in a PE depending upon the processing power required.


[0126] DMA command list 2334 also includes a series of kick commands, e.g., kick commands 2355 and 2358. Kick commands are commands issued by a PU to an APU to initiate the processing of a cell. DMA kick command 2355 includes virtual APU ID 2352, kick command 2354 and program counter 2356. Virtual APU ID 2352 identifies the APU to be kicked, kick command 2354 provides the relevant kick command and program counter 2356 provides the address for the program counter for executing the program. DMA kick command 2358 provides similar information for the same APU or another APU.

[0127] As noted, the PUs treat the APUs as independent processors, not co-processors. To control processing by the APUs, therefore, the PU uses commands analogous to remote procedure calls. These commands are designated "APU Remote Procedure Calls" (ARPCs). A PU implements an ARPC by issuing a series of DMA commands to the DMAC. The DMAC loads the APU program and its associated stack frame into the local storage of an APU. The PU then issues an initial kick to the APU to execute the APU Program.

[0128] FIG. 24 illustrates the steps of an ARPC for executing an apulet. The steps performed by the PU in initiating processing of the apulet by a designated APU are shown in the first portion 2402 of FIG. 24, and the steps performed by the designated APU in processing the apulet are shown in the second portion 2404 of FIG. 24.

[0129] In step 2410, the PU evaluates the apulet and then designates an APU for processing the apulet. In step 2412, the PU allocates space in the DRAM for executing the apulet by issuing a DMA command to the DMAC to set memory access keys for the necessary sandbox or sandboxes. In step 2414, the PU enables an interrupt request for the designated APU to signal completion of the apulet. In step 2418, the PU issues a DMA command to the DMAC to load the apulet from the DRAM to the local storage of the APU. In step 2420, the DMA command is executed, and the apulet is read from the DRAM to the APU's local storage. In step 2422, the PU issues a DMA command to the DMAC to load the stack frame associated with the apulet from the DRAM to the APU's local storage. In step 2423, the DMA command is executed, and the stack frame is read from the DRAM to the APU's local storage. In step 2424, the PU issues a DMA command for the DMAC to assign a key to the APU to allow the APU to read and write data from and to the hardware sandbox or sandboxes designated in step 2412. In step 2426, the DMAC updates the key control table (KTAB) with the key assigned to the APU. In step 2428, the PU issues a DMA command "kick" to the APU to start processing of the program. Other DMA commands may be issued by the PU in the execution of a particular ARPC depending upon the particular apulet.

Sorry for the long post...
 
MfA said:
In a 1 person company you dont need a boss.

Enter Adam Smith to show us how even 1 person pin company get their ass kicked by distribution of labor.

Would you rather be the sole worker of a company with no overhead as everything is processed internally by you alone or have a 5000 worker company and put up with the small ovehead to coordinate them? I thought we tackled this problem in socio-economic terms a few hundred years ago.
 
MfA said:
If food is in short supply what you gonna do ...

The same thing the corperate world does; you scale your employment based on the given task that needs to be completed/produced in each situation.

You don't have one superguy doing everything. I think there is enough emperical evidence as seen by IBM R&D that the single worker paradigm is flawed in it's budget of transistors and how it's allocated and that inheriently lower complexity architectures can overcome this. Yet, we shall see.
 
Mfa said:
In a 1 person company you dont need a boss.
If you want 3d acceleration, you will have more then one person in that company one way or another. The only question is whether you would have all the workers stuffed inside a single cubicle or not.

And frankly with a freaking optical drive and backlit highres display I doubt calculation units will be the main consumption of food :p
 
Back
Top