“PS3.5″? – New Sony Patent

IT IS NOT AN ADD-ON FOR PS3! The patent describes a hardware feature of a processor that needs a massive connecting bus to function as a processing extension. There's nothing about the PS3 that enables this - it's original Cell lacks the dual-channel IO interface hardware, and it has no external bus that can support the patent's suggest 60 GB/s communications systems. This rumour needs to die.

It can't be. you'll never have an external interconnect capable of running as fast as on-chip communications. Three seprrate dies will communicate with each other an order of magnitude slower than on the same die, and memory access will be less efficient if they're having to manage memory access in three places instead of just one. The only way your proposed idea would make sense is if there's a reason to have a GPU with embedded Cell, such as for a common graphics engine. But that prescribes the same GPU for all devices, which doesn't make sense as GPU requirements vary considerably. It makes far more sense to have separate 4-SPU Cells connected up with whatever GPU is suitably to that device, and then use the patent to combine these Cell across devices. If you need more than that 4 SPU chip in numbers needed for a console, it makes more sense to have them all on one chip, in exactly the same way we have multicore chips for a reason!

The patent describes the processes in which a 4 element SPU Cell could be used. Those are descriptions of SOFTWARE functions. Direct connection to the memory buss was mentioned.

Limiting factors are now pin count/heat with density and real world functions. I.E. Waiting on input and output. Having a common memory buss that is running nearly 1:1 with the CPUs changes design criteria.

We are not talking Cray speeds here so some issues are irrelevant like requiring all three Cell processors on the same die to reduce the length of buss lines. RISC chips were designed to increase yield by having fewer transistors in a CPU which both reduced % failures and allowed more CPUs per wafer. Putting 3 Cell processors in the same package would increase costs because % failure would increase with complexity. Failures due to heat would also increase.

There have been papers describing the features and faults in a heterogeneous CPU design like the Cell and I believe the conclusion is the idiot- Savant SPU needs supervision and a ratio of more than 1PPU to 4 SPUs results in performance issues. Thus having more than 4 SPUs connected to a common buss with one PPU is not a good design. Just so we understand each other, the buss inside a Cell should contain a ratio of 1PPU to 4 SPUs. This internal on die buss should not have other Cell processor connected to it. Outside the cell it's connected to a common memory buss.

There is, as you mentioned, issues with multiple Cells addressing the same memory buss (or inside a die the same cell buss) and I expect a max number might be three before collisions reduce performance. The Cell between real world I/O and the memory buss would decrease memory access and increase performance.

There will be NO three 4 element SPU cells on the same die. This has been confirmed by IBM (no 32 SPU or 16 SPU chips). Separate 4 SPU cells will be positioned far enough apart to more easily allow heat dissipation but close enough that buss length is not an issue. They will have assigned functions like GPU, IO or Central Processor. I can see a die with generalized functions and one 4 element SPU usable in multiple products.
 
Last edited by a moderator:
It's a very interesting patent not because of the specific mechanism, but the general idea of applying distributed computing concepts to Home consoles. The full SMP-type set up will not be practical with the current hardware, but it may be possible to go "half way". The key problem is incremental cost. We should just wait for Sony to say something, if ever.

It's interesting, but it'll never happen, which is why it isn't mentioned in the patent. There's also scaling issues, sychronization issues, latency issues, and a whole host of other issues involved with distributed computing. All of which are extremely hostile to game code, especially if you want to have low latency controls.

Add to that, considering it took 2+ years for developers to come to grasp with the PS3 architechture and now you're thinking about throwing problems at them that are orders of magnitude more difficult to solve? Would require new developement tools, new best practices, etc.

Do you really think Sony wants a repeat of that? As Shifty suggested earlier, a better solution for a future Cell based gaming system would be one that features 2-3 PPUs in addition to 8 SPUs, possibly more. Or just to drop Cell and move onto something more conventional with a more capable GPU and a more general purpose CPU. Just like Apple eventually had to ditch PowerPC due to R&D on it falling rapidly behind x86 CPUs.

Regards,
SB
 
Where have IBM confirmed a 4 SPU Cell chip, especially with a new IO controller?

IBM didn't confirm a 4 SPU cell as you know. IBM DID confirm no 16 SPU Cell. Sony has confirmed a 4 SPU cell in the recently released patent.

As to a die with a 4 SPU cell and IO circuitry, that is speculation on my part. It's getting easy to produce a custom run with building blocks as long as they don't introduce "issues" to the fab process. A 4 SPU cell using the latest fab technology to increase efficiency and reduce heat could be a building block especially at slower speeds.

So we could have a die for IO running at slower speeds including a 4 SPU cell. This would be used in all internet enabled Sony products. Gigabit port, USB, HDMI, Audio.
 
Last edited by a moderator:
It's interesting, but it'll never happen, which is why it isn't mentioned in the patent. There's also scaling issues, sychronization issues, latency issues, and a whole host of other issues involved with distributed computing. All of which are extremely hostile to game code, especially if you want to have low latency controls.

Add to that, considering it took 2+ years for developers to come to grasp with the PS3 architechture and now you're thinking about throwing problems at them that are orders of magnitude more difficult to solve? Would require new developement tools, new best practices, etc.

Do you really think Sony wants a repeat of that? As Shifty suggested earlier, a better solution for a future Cell based gaming system would be one that features 2-3 PPUs in addition to 8 SPUs, possibly more. Or just to drop Cell and move onto something more conventional with a more capable GPU and a more general purpose CPU. Just like Apple eventually had to ditch PowerPC due to R&D on it falling rapidly behind x86 CPUs.

Regards,
SB
Given all the above as legitimate issues, a digital ecosystem with multiple Sony products requiring DRM justifies a Cell architecture. HDMI, or rather DRM protected video streams over a network allowing any Sony product to display to a Sony TV from any room to any room is possible with the Cell.

With all Sony products using a Cell the investment in OS software can be justified.

There are processes like Video or audio that have predictable reoccurring cycles and can easily be distributed processed. For instance, 3-D can be processed by a TV given 2D and depth from a player/Game reducing over head for the game console.

My speculation is based on the publishing of the Sony patent. The above is what I believe it suggests. Given the REQUIRED best use of the CELL for DRM it could only be used as a super IO controller with DRM in all Sony products with your suggestion for a CPU and GPU.

We are all just best guessing.
 
Just a quick comment on PPU to SPU ratio: for most applications a PPU isn't actually necessary. SPUs can feed and manage other SPUs for most critical applications. Insomniac published on this a few years ago.
 
Just a quick comment on PPU to SPU ratio: for most applications a PPU isn't actually necessary. SPUs can feed and manage other SPUs for most critical applications. Insomniac published on this a few years ago.

My understanding is that the above is true BUT it requires a SPU to emulate a PPU and that severely reduces performance and was only done because one PPU was not enough to manage even 7 SPUs. A RISC can emulate a CISC but at a cost - larger code requiring more memory and cycles. The SPUs can be thought of as super RISC chips while the PPU, a RISC chip, can be thought of as a CISC when compared to a SPU.

Your comment goes to point out that the ratio of 1:7 (PPU - SPU) for the PS3 cell is below an optimum design. Sony's choice of 1:4 might be optimal given the uses it will be put to.
 
Last edited by a moderator:
IBM didn't confirm a 4 SPU cell as you know. IBM DID confirm no 16 SPU Cell. Sony has confirmed a 4 SPU cell in the recently released patent.
A patent isn't a confirmation of anything other than an idea! Most patents never become real products. This patent does not signify the existence or intention to create a 4 SPU Cell - in exactly the same way the patent for the Cell Visualiser, a Cell-based GPU, did not mean Sony were making one. Neither does this patent undo the original scalable architecture of Cell that would allow more cores on the same die.

Your idea that there are better scales of economy doesn't stack up when you look at every dual and quad core CPU out there. These aren't made by fabricating individual cores and assembling them on a package, but manufacturing the multicored die up front. Why should Cell be any different? Why would it be better for Sony to manufacture 3 tiny dies and assemble them on a chip, rather than manufacture one small die (a 3+12 configuration isn't a large chip by modern standards), whereas for Intel and AMD and everyone else, putting 2 or 4 or more cores on the same, larger die is a better economy? Let alone separating tham across chips!
 
A patent isn't a confirmation of anything other than an idea! Most patents never become real products. This patent does not signify the existence or intention to create a 4 SPU Cell - in exactly the same way the patent for the Cell Visualiser, a Cell-based GPU, did not mean Sony were making one. Neither does this patent undo the original scalable architecture of Cell that would allow more cores on the same die.

Your idea that there are better scales of economy doesn't stack up when you look at every dual and quad core CPU out there. These aren't made by fabricating individual cores and assembling them on a package, but manufacturing the multicored die up front. Why should Cell be any different? Why would it be better for Sony to manufacture 3 tiny dies and assemble them on a chip, rather than manufacture one small die (a 3+12 configuration isn't a large chip by modern standards), whereas for Intel and AMD and everyone else, putting 2 or 4 or more cores on the same, larger die is a better economy? Let alone separating tham across chips!

All the above can be true. I seem to remember a discussion in this forum on multiple SPE or SPU elements in the same die connected to the same buss sharing 256K of memory and the conclusion was that too many on the same buss would cause issues that reduced efficiency. Given the above might be a reason for not having more than 4 or 8 SPEs plus PPUs on the same buss. IF they are not on the same buss then moving them off the die to reduce heat concentration makes sense.

IF the PS4 is to have video recognition/voice recognition/gesture control and USB 3 ports then a 4 element Cell used as a preprocessor for the real world information with it's own memory (64 meg or so) to store video screens for comparisons or audio for voice recognition routines might be a good use.

Shifty, the actual main PS3 CPU could have several, maybe three, 4 element Cells on one die. If that's the best way then so be it. But I can see distributed processing inside the PS4 as well as outside the PS4. It just makes sense.

Your idea that there are better scales of economy doesn't stack up when you look at every dual and quad core CPU out there. These aren't made by fabricating individual cores and assembling them on a package, but manufacturing the multicored die up front. Why should Cell be any different?

The cell is different. More complex instruction set chips require smaller stacks/registers and fewer memory access cycles. You can put three or 4 in a package and might not have issues with registers. SPUs are super Risc like and require that they each have their own register memory connected to a common buss memory and then externally directly to memory with the memory of choice RAMBUSS to not limit extremely intensive memory access.
 
Last edited by a moderator:
IF the PS4 is to have video recognition/voice recognition/gesture control and USB 3 ports then a 4 element Cell used as a preprocessor for the real world information with it's own memory (64 meg or so) to store video screens for comparisons or audio for voice recognition routines might be a good use.
What's the advantage to that extra complexity of a discrete RAM pool and separate memory bus over having those 4 cores on the main CPU sharing one bus where you can invest the budget on making it a faster bus instead of splitting the budget over a subunit? The only way it makes sense to me is if the CPU is large and costly and still not fast enough, requiring some extra power from somewhere, in which case you have to add an extra CPU. The amount of effort needed from video tracking would be all of, say, 10% of a decent sized future Cell chip, so it's not worth the effort of designing a subunit. The lowest cost of a device is obtained by using the lowest number of possible packages and components, fewest busses, and simplest mobos.

Shifty, the actual main PS3 CPU could have several, maybe three, 4 element Cells on one die. If that's the best way then so be it.
But it isn't, as evidenced by generations of processors all working on integrating components/cores onto the same die instead of separating them. If there was a practical advantage to 3 4-SPU Cell's on a package as you've suggested, then that same advantage would apply to 4 discrete x86 cores on the same package, which is what we'd be seeing from Intel. But we don't, instead Intel making 2 core and 4 core and 6 core and 8 core chips instead of the theoretically more flexible networked monocores configured to whatever format is wanted. Matters of production, yields and thermals in use, limit the maximum size of a chip which is why we have distributed computing, but you don't start using lots of the same chip until you reach the physical limits, because it's not cost nor performance effective to do so.
 
They serve different needs, and belong to different layer of the architecture.

A distributed system approach to system design allows users to upgrade their box(es) incrementally and preserve their existing investments (Just like what the AirForce people said when they upgraded their PS3 cluster!). The most difficult problem is whether they can get the incremental cost low enough.

As for programming issues, I think it depends on the applications. Assuming the external unit comes with SPUs and memory...

You can use the extra memory as VM extension, RAM disk, cache for accessing Blu-ray. PS3 is difficult to program mostly because of the small Local Store and the small system memory. For applications like a WebKit browser and Home, it'd be a godsend.

For applications like background recording in PlayTV/Tourne, background download in XMB or even Blu-ray, the kernel should be able to move the jobs to the external SPUs and memory. It should be transparent to the running app after they are loaded to the SPUs.

For something like a HD Camera unit or an ultrasonic sensor (in their other patents), then the outboard SPUs and memory can pre-process the input independently, and appear as a smart input device to the existing system.

For parallel rendering, it'd depend on their experiment in GT5 parallel renderer. It may require extra hardware to sync the HDMI output (tile or alternate frames ?). They may need it anyway if they want a full 3D Blu-ray stack without audio compromise and Java compromise. The old PS3 owners may get a HDMI 1.4 automatically.

There are other interesting backward compatibility patents: There is one for SPUs to emulate a PS2, but it's hindered by internal bandwidth (Will edRAM work ?). The other backward compatibility patent specifies an ethernet-connected module. These are all bits of possibilities and experiments but the chief problem may be the economics (even if Sony sell the modules directly from their eStore). The rest may be addressable by technical means.

Well now if the private key exploit causes Sony to lose revenue big time, the additional hardware module sale may help to beef up revenue and security mechanism. If nextgen is based on some sort of Cell, then they need to spread its use as much as possible this gen (including parallel renderer). Ironically, the security problem may help to achieve that both in cultivating Cell competency and volume sales.
 
They serve different needs, and belong to different layer of the architecture.
Are you talking about the same thing as Jeff? This post of yours seems to be talking about processors over devices, or an upgrade deivce for PS3. Jeff was suggesting discrete dies in the same device. Remember this patent does not apply to PS3 getting a hardware processor+RAM extension to aid things like web browsing!

You can use the extra memory as VM extension, RAM disk, cache for Blu-ray. PS3 is difficult to program mostly because of the small Local Store and the small system memory.
Not really true. LS isn't particularly small. It has as much RAM as the original Amiga which was able to multitask several applications in that space (although you really wanted 512kb). The issues with programming PS3 are managing its resources, having to find how to do stuff on Cell to support RSX, and writing efficient SPU code which is all about data structures. The memory management is a faff, having to determine youself what to prefetch and not having a cache handle that for you, but that's not the biggest concern.
 
Are you talking about the same thing as Jeff? This post of yours seems to be talking about processors over devices, or an upgrade deivce for PS3. Jeff was suggesting discrete dies in the same device. Remember this patent does not apply to PS3 getting a hardware processor+RAM extension to aid things like web browsing!

This patent describes a mechanism to "provide for interconnecting one or more multiprocessors and one or more external devices through one or more configurable interface circuits, which are adapted for operation in: (i) a first mode to provide a coherent symmetric interface; or (ii) a second mode to provide a non-coherent interface."

The memory mapped I/O alone doesn't need this patent. Cross-device SPU communication may need part of the patent.

Not really true. LS isn't particularly small. It has as much RAM as the original Amiga which was able to multitask several applications in that space (although you really wanted 512kb). The issues with programming PS3 are managing its resources, having to find how to do stuff on Cell to support RSX, and writing efficient SPU code which is all about data structures. The memory management is a faff, having to determine youself what to prefetch and not having a cache handle that for you, but that's not the biggest concern.

You have to load the SPU program in LS, then split the remaining memory into 2 for double buffering. Based on an nAo's earlier post, we can't run the SPUlet in debug mode because the LS is too small to fit the debug run-time. So you have to troubleshoot "in the blind".

Restructing data structures into contiguous memory and explicit DMA fetches are indeed a big software design challenge. For a closed box, explicit memory management may be advantageous (with or without the DMA fetches).

For main memory issues, we are hearing regular developer complains about memory limits, especially now that they are more familiar with SPU development, and become more ambitious.
 
What's the advantage to that extra complexity of a discrete RAM pool and separate memory bus over having those 4 cores on the main CPU sharing one bus where you can invest the budget on making it a faster bus instead of splitting the budget over a subunit? The only way it makes sense to me is if the CPU is large and costly and still not fast enough, requiring some extra power from somewhere, in which case you have to add an extra CPU. The amount of effort needed from video tracking would be all of, say, 10% of a decent sized future Cell chip, so it's not worth the effort of designing a subunit. The lowest cost of a device is obtained by using the lowest number of possible packages and components, fewest busses, and simplest mobos.

But it isn't, as evidenced by generations of processors all working on integrating components/cores onto the same die instead of separating them. If there was a practical advantage to 3 4-SPU Cell's on a package as you've suggested, then that same advantage would apply to 4 discrete x86 cores on the same package, which is what we'd be seeing from Intel. But we don't, instead Intel making 2 core and 4 core and 6 core and 8 core chips instead of the theoretically more flexible networked monocores configured to whatever format is wanted. Matters of production, yields and thermals in use, limit the maximum size of a chip which is why we have distributed computing, but you don't start using lots of the same chip until you reach the physical limits, because it's not cost nor performance effective to do so.

Patsu; can you explain it to Shifty. I don't seem to have the skill set to do so.

From Patsu: "For something like a HD Camera unit or an ultrasonic sensor (in their other patents), then the outboard SPUs and memory can pre-process the input independently, and appear as a smart input device to the existing system." The Cell that pre-processes does not have to be outside the PS4. For some real world functions the hit on the Main CPU memory can be large and preprocessing the real world information a benefit to performance.
 
Last edited by a moderator:
Are we just having some crazy misunderstanding of what on-die and on-package constitute. :|
 
This patent describes a mechanism to "provide for interconnecting one or more multiprocessors and one or more external devices through one or more configurable interface circuits, which are adapted for operation in: (i) a first mode to provide a coherent symmetric interface; or (ii) a second mode to provide a non-coherent interface."

The memory mapped I/O alone doesn't need this patent. Cross-device SPU communication may need part of the patent.
A patent is summed up in its first clause, not its introduction.

This patent is for:
1. A multiprocessor system, comprising:
a plurality of processors operatively coupled to one another over one or more communication busses; and
a configurable interface circuit operating in a first mode and a second mode, either simultaneously or alternatively, in response to one or more control signals yada yada...
The thing being patented here is the idea of the BIC, a memory interface that can swap between modes and isn't fixed in function as it is in current processors. This patent is not just for networking processors! Every idea had by anyone regards PS3 expansions and adding processing and RAM and whatnot may well be valid ideas and may be workable somehow, but are not covered by this patent which needs a configurable interface circuit (the BIC). Any and every PS3 idea that is not using a BIC or emulating a BIC somehow does not belong in this thread. ;)

Patsu; can you explain it to Shifty. I don't seem to have the skill set to do so.

From Patsu: ... The Cell that pre-processes does not have to be outside the PS4. For some real world functions the hit on the Main CPU memory can be large and preprocessing the real world information a benefit to performance.
Whether the 4 SPU Cell is inside the PS4 or outside, it's still being 'broken off' from the main CPU. A 4 SPU Cell in the PS4 but outside the central Cell processor is the same thing as an x86 core inside your PC but outside PC's main CPU. You are still fracturing your processor to no benefit.

Explain to me why I would be better off with 4 separate x86 processors in my PC rather than one 4-core CPU.

Are we just having some crazy misunderstanding of what on-die and on-package constitute. :|
I dunno what's going on!
 
http://www.engadget.com/2010/12/22/sony-buys-back-toshibas-cell-plant-for-50-billion-yen-makes-a/

Sony buys back Toshiba's Cell fab plant

Looks like Toshiba's Cell processor ambitions didn't quite pan out -- Japanese news sources are reporting that the company's selling its Nagasaki manufacturing plant back to Sony for 50 billion yen, or roughly $597 million in US money. Considering that Toshiba originally purchased the semiconductor facility for 100 90 billion yen (then $835 million) back in 2008, it seems like Sony's making out like a bandit here -- and it may have just found the perfect place to build more CMOS chips for its high-end camera lineup, too. Sony reportedly told the Nikkei Business Daily that it may repurpose the facility to produce HD image sensors for cameras and smartphones. What will happen to the chip that launched 40 million PS3s and a graphics co-processor or two? With any luck, we'll find out at CES 2011 quite soon.

Is the timing of the publishing of the Sony Cell patent tied to this sale?

http://www.electronista.com/articles/10/12/24/toshiba.moves.cell.to.sony.outsources.to.samsung/

Toshiba on Friday confirmed a pair of major deals to offset problems with its chip business. The company will end its teamwork on the Nagasaki plant making the Cell processor for PlayStation 3 chips and hand over full control to Sony. The deal should take effect as soon as April.

The Japanese electronics giant has similarly confirmed that it will outsource chip production to Samsung. Toshiba will still design system chips and manufacture some designs but will have Samsung, and potentially other companies, handle the final assembly in more cases. It will start handing over production at about the same time as the Sony deal.

Toshiba is also reorganizing its own divisions and will split them into both a system chip group and a separate group for analog and imaging chips that will also be a catch-all for other hardware. The new structure will take effect on New Year's Day.

http://jesaa.com/2010/12/sony-confirms-toshiba-cell-chip-plant-buy-back/

NSM, which was established in March 2008 and is located in the Nagasaki Technology Center of Sony Semiconductor Kyushu Corporation (“SCK”), has been manufacturing the high-performance “Cell Broadband Engine™” processor, the graphics engine “RSX” and other high-performance semiconductors and leading-edge SoC (system-on-a-chip) for applications in digital consumer products of Toshiba and Sony. The facilities to be transferred would be the fabrication facilities and equipment for the 300 mm wafer line located within the Nagasaki Technology Center purchased by Toshiba from Sony and SCK and leased to NSM in 2008 and other facilities that Toshiba and Sony will agree to transfer among those in which Toshiba invested in connection with the operation by NSM after the purchase.
 
Last edited by a moderator:
Back
Top