Any news on PS2 backwards compatibilty

Are you sure? The original source in japanese mention the gs also in the spu.

Following the GAF thread, it looks like the commentary was made by the writer of the article, not the patent. The patent does not say anything specific about GS emulation. The SPU serves to interface with the GS.

This is backward compatibility. It's a lot harder than exploiting Cell from scratch because the code has been shipped years ago. :)
Unless they do something to the original source (or generated binary), it may be difficult to overcome certain demand a PS2 game makes. I think it's the edram bandwidth that's "impossible" to emulate via software.

I'd expect a full/partial PS2 emulator (on PS3) would have to cover the "tampering the source" part, but I could be wrong.
 
Yes but this :
[0059]The graphics subsystem 220 may include a graphics processing unit (GPU) 222 and graphics memory 224. The graphics subsystem 220 may periodically output pixel data for an image from the graphics memory 224 to be displayed on the display device 226. The display device 226 may be any device capable of displaying visual information in response to a signal from the system 200, including CRT, LCD, plasma, and OLED displays. The graphics subsystem 220 may provide the display device 226 with an analog or digital signal. By way of example, the display device 226 may include a cathode ray tube (CRT) or flat panel screen that displays text, numerals, graphical symbols or images. The graphics memory 224 may include a display memory (e.g., a frame buffer) used for storing pixel data for each pixel of an output image. The graphics memory 224 may be integrated in the same device as the GPU 222, connected as a separate device with GPU 222, and/or implemented within the memory 202. Pixel data may be provided to the graphics memory 224 directly from the PPE 204 and or SPEs 206 including SPU1. Alternatively, the PPE 204 and/or SPEs 206 may provide the GPU 222 with data and/or instructions defining the desired output images, from which the GPU 222 may generate the pixel data of one or more output images. The data and/or instructions defining the desired output images may be stored in memory 202 and/or graphics memory 224. In an embodiment, the GPU 222 may be configured (e.g., by suitable programming or hardware configuration) with 3D rendering capabilities for generating pixel data for output images from instructions and data defining the geometry, lighting, shading, texturing, motion, and/or camera parameters for a scene.The GPU 222 may further include one or more programmable execution units capable of executing shader programs.
from this
http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=1&f=G&l=50&co1=AND&d=PG01&s1=sony&s2=%22emotion+engine%22&OS=sony+AND+%22emotion+engine%22&RS=sony+AND+%22emotion+engine%22
?
 
Last edited by a moderator:
That's how PS3 works (SPE and PPE writing into GPU memory). It does not talk about GS or how to emulate one without loss of functionality.
 
Good thinking. But the local store is only 256K per SPE. In practise, it'd be smaller since the local store needs to hold the SPU code and double buffers the data. They have to DMA like crazy to overlap the fetches.

Will have to look up the actual LS bandwidth figure. I'm famished. Going out for lunch now ^_^
 
What is the exact bandwidth an SPU has to the local memory on the SPE?

Fairly sure from memory its 256 ... vs 22 to FlexIO RAM.

What if you used a little bit of code on the SPE to do data compression? Maybe the combination could do the trick. Would still be very hard though.
 
Arwin, your 256 number (actually 204.9 GB/s summed) is the EIB bandwidth right ?

I think Weaste wants the local store bandwidth within the SPU itself. The EIB is for accessing another SPU's local store. Accessing "your own" local store should be much faster.
 
Arwin, your 256 number (actually 204.9 GB/s summed) is the EIB bandwidth right ?

I think Weaste wants the local store bandwidth within the SPU itself. The EIB is for accessing another SPU's local store. Accessing "your own" local store should be much faster.

Key features of the LS include:
Holds instructions and data
16-bytes-per-cycle load and store bandwidth, quadword aligned only
128-bytes-per-cycle DMA-transfer bandwidth
128-byte instruction prefetch per cycle

but... DMA reads and writes are executed once every 16 cycles (at most)...

DMA reads and writes always have highest priority. Because hardware supports 128-bit DMA reads and writes, these operations occupy, at most, one of every eight cycles (one of sixteen for DMA reads, and one of sixteen for DMA writes) to the LS. Thus, except for highly optimized code, the impact of DMA reads and writes on LS availability for loads, stores, and instruction fetches can be ignored.

(from IBM's Programming Tutorial)

The table lists LOADS and STORES touching LS as having a maximum occupancy of 1 SPU cycles (so it would seem that they have a throughput of 1, that is you can issue one every cycle... but maybe I am getting confused by something).

16 * 3.2 GHz = 51.2 GB/s (which is not half bad especially when you consider that is feeding a 128x128 bits register file and that you code works from that huge register file).
 
Thanks Pana !

The EIB bandwidth is shared among 8 guys (7 SPUs + 1 PPU with 1 disabled SPU). The "internal" local store bandwidth should be dedicated to just one SPU. DMA has negligible impact on this bandwidth according to: http://www.mc.com/uploadedfiles/cell-perf-simple.pdf

Arwin's 256Gb/s number applies to a 4Ghz Cell (theoretical max of EIB). PS3 has 3.2Ghz Cell, hence 204.8Gb/s (not 204.9).

What is the GS bandwidth in question ?
 
I am not sure if you guys remembered this but Kutaragi was quoted in an old interview that the plan was always to go software BC, the program has been worked on for some time but it was not ready for PS3 launch, they had to put in the full PS2 hardware in launch units. Anyone still have this interview?

So this whole BC software patent is not news or a sudden development unless Kutaragi was lying. What interests me is how the project has changed after the CEO quited. Will Kaz put as much faith to continue the software development or put it on ice and keep with the no BC stance.
 
Last July: http://www.engadget.com/2008/07/16/scea-ceo-jack-tretton-dishes/

On backwards compatibility: Jack explained that Sony looked at how to "not take a greater hit on production cost, without losing PlayStation's heritage ... Hardware / software for backwards compat wasn't all that expensive. ... but we're selling PS2 software to PS2 customers, and selling PS3 software to PS3 consumers." Still, Jack seems to feel like it may have been the wrong move. "I would like to have had it in there, but Sony's collective strategy determined we could afford to lose it. We've now gone down that road, and we're not going back."

I think Sony wants PS3 owners to support PS3 developers as much as possible. Not sure if they have changed position (e.g., More PS1 games available in US PSN now). It's hard to gauge Sony's move sometimes.
 
They may have an eye on Wii's VC and be thinking 'hey, this is easy money for old rope!" and now want to roll out retro titles.
 
but... DMA reads and writes are executed once every 16 cycles (at most)...



(from IBM's Programming Tutorial)

The table lists LOADS and STORES touching LS as having a maximum occupancy of 1 SPU cycles (so it would seem that they have a throughput of 1, that is you can issue one every cycle... but maybe I am getting confused by something).

16 * 3.2 GHz = 51.2 GB/s (which is not half bad especially when you consider that is feeding a 128x128 bits register file and that you code works from that huge register file).

Sorry, yes, you're right, it's 16bits per cycle for the local store, and 1024bits every 16 cycles for DMA transfer, which takes just one cycle, so that there's very little contention and reduces the local store access speed only by 1/16th.

Actually patsu the 256 figure is 360 EDRAM that I got confused with ... :D :oops:
 
So in general here, if direct emulation of hardware is the approach to be used, and ignoring how much of Cell is required to emulate the Emotion Engine, looking around for specifications, on a game running in 480p 32bits for colour/alpha and a 32 bit Z, the frame buffer in this case (I suppose most games don't go above this) would be around 2.5MB and we would need to read/write to it at 40GB/s and pull textures into Cell at 10GB/s? Is this correct?
 
So in general here, if direct emulation of hardware is the approach to be used, and ignoring how much of Cell is required to emulate the Emotion Engine, looking around for specifications, on a game running in 480p 32bits for colour/alpha and a 32 bit Z, the frame buffer in this case (I suppose most games don't go above this) would be around 2.5MB and we would need to read/write to it at 40GB/s and pull textures into Cell at 10GB/s? Is this correct?

Not sure where you got the numbers from, but I am assuming the GS EDRAM bandwidth somehow.

Anyway that may be the case if we assume an instruction for instruction emulation of the GS and no cache is being used, but if there is an intelligent JIT compiler (or offline compiler for that matter) that translates multi pass GS instruction sequences to single pass RSX instruction sequences the requirement of the bandwidth may be significantly reduced. The cache of the RSX may be of some help as well.
 
Last edited by a moderator:
Thanks Pana !

The EIB bandwidth is shared among 8 guys (7 SPUs + 1 PPU with 1 disabled SPU). The "internal" local store bandwidth should be dedicated to just one SPU. DMA has negligible impact on this bandwidth according to: http://www.mc.com/uploadedfiles/cell-perf-simple.pdf

Arwin's 256Gb/s number applies to a 4Ghz Cell (theoretical max of EIB). PS3 has 3.2Ghz Cell, hence 204.8Gb/s (not 204.9).

What is the GS bandwidth in question ?

The GS has an array of Pixel Engines connecting to page buffers (8 KB in size). The Pixel Engines have an aggregate bandwidth of 48 GB/s thanks to a very wide bus solution (1,024 bits Read + 1,024 Write + 512 bits texture buses shared by the 16 Pixel Engines).

The 48 GB/s number is the bandwidth between Pixel Engines and page buffers. The actual page buffers are refilled from the DRAM macros at a much higher speed than that... I think the speed is something like 150+ GB/s between page buffers and DRAM macros IIRC.

Another issue with GS emulation is the fact that the GS supports two rendering contexts (duplicated set of registers) and was able to quickly change primitive type, trash texture and z-buffer, etc... several times per frame without breaking a sweat. Something that modern GPU's do not like too much. To that you can add tons of PS2 developers which exploited every feature and bug to their advantage to gain a few more fractions of ms of rendering time... which will not make emulator writers that happy :p.
 
Last edited by a moderator:
Sorry, yes, you're right, it's 16bits per cycle for the local store, and 1024bits every 16 cycles for DMA transfer, which takes just one cycle, so that there's very little contention and reduces the local store access speed only by 1/16th.

I think that should be 16 bytes per cycle for LS, quadword aligned.

DMA is 128 bytes per cycle once it's set up. I don't think they interfere with each others' bandwidth much since local store load should not go through DMA.


The GS has an array of Pixel Engines connecting to page buffers (8 KB in size). The Pixel Engines have an aggregate bandwidth of 48 GB/s thanks to a very wide bus solution (1,024 bits Read + 1,024 Write + 512 bits texture buses shared by the 16 Pixel Engines).

The 48 GB/s number is the bandwidth between Pixel Engines and page buffers. The actual page buffers are refilled from the DRAM macros at a much higher speed than that... I think the speed is something like 150+ GB/s between page buffers and DRAM macros IIRC.

Another issue with GS emulation is the fact that the GS supports two rendering contexts (duplicated set of registers) and was able to quickly change primitive type, trash texture and z-buffer, etc... several times per frame without breaking a sweat. Something that modern GPU's do not like too much. To that you can add tons of PS2 developers which exploited every feature and bug to their advantage to gain a few more fractions of ms of rendering time... which will not make emulator writers that happy :p.

Ouch, it would be a technical marvel (or a dreadful hack) if Sony comes up with a PS2 emulator for PS3 without fudging the source.
 
I think that should be 16 bytes per cycle for LS, quadword aligned.

DMA is 128 bytes per cycle once it's set up. I don't think they interfere with each others' bandwidth much since local store load should not go through DMA.

No, but the LS can support just so many concurrent accesses to it. Which is why, despite having highest priority, DMA reads and writes only occur once every 16 SPU cycles at most... not to take away too much bandwidth from LOADS and STORES.
 
Last July: http://www.engadget.com/2008/07/16/scea-ceo-jack-tretton-dishes/



I think Sony wants PS3 owners to support PS3 developers as much as possible. Not sure if they have changed position (e.g., More PS1 games available in US PSN now). It's hard to gauge Sony's move sometimes.

Yes remember they used to have a very detailed webpage about compatibility of many PS2 games? I think PS2 BC is a financially and technically piece of cake if Sony wanted, it could be fully software or it could need a small low cost extra chip. The stance of management is the determination. I cannot blame them as we are past the point of BC being a marketing victory. I don't expect them to backtrack so soon, but BC could come back in the last legs of PS3 Slim-mer and once PS2 stops selling.
 
Back
Top