Will the PS3 be able to decode H264.AVC at 40 Mps?

Will the PS3 decode H.264/AVC at 40 Mps?

  • Yes, the PS3 will decode H.264/AVC at 40 Mps

    Votes: 86 86.9%
  • No, the PS3 won't decode H.264/AVC at 40 Mps

    Votes: 13 13.1%

  • Total voters
    99
Another thing I was pondering is audio decode.

Correct me if I'm wrong. Sony will have to dedicate an entire SPE to handle the task of decoding the audio. Or is the audio decode process already baked into the entire "decode" package?

So 7 usable SPE's for each PS3 CELL cpu.

1 SPE reservered for the operating system

1 SPE slaved to audio decode

5 SPE's left over to decode video
 
Yeah, the OS and audio will take some SPE resources away (Not sure you really need 1 full SPE for audio though and the OS might have partial audio support).

But if...
* The slices (starting from sync-points) can be parallelized using multiple SPE cores
* Other AVC HP profile stages can also be parallelized even easier (either using multiple SPEs or using SIMD engine), and
* Within each slice, sequential CABAC code seems rather predictable and friendly to NUMA architecture, so the entire thing can run within Local Store (which has 4-6 cycles, 2-4ns latency).

Cell should out-perform any GPPs in decoding AVC HP profile (probably a few folds). Not sure whether it's up to par with 40Mbps but at least there is no inherent restrictions that prevent Cell from performing its magic. Correct ?
 
Last edited by a moderator:
Another thing I was pondering is audio decode.

Correct me if I'm wrong. Sony will have to dedicate an entire SPE to handle the task of decoding the audio. Or is the audio decode process already baked into the entire "decode" package?

So 7 usable SPE's for each PS3 CELL cpu.

1 SPE reservered for the operating system

1 SPE slaved to audio decode

5 SPE's left over to decode video


All the audio formats supported on BR are uncompressed lossless formats. (link) Even if it was compressed there should be no reason for it to take up the whole a whole SPU.

I don't recall hearing about huge explosion of CPU power required for audio decode algorithms. Audio decode was running just fine since the Pentium2 days even when clocks were still under 300mhz.

I think the only time people have talked about reserving a whole SPU for audio is when modleing 3D sound acoustics and mixing in hundreds of channels etc.
 
All the audio formats supported on BR are uncompressed lossless formats. (link) Even if it was compressed there should be no reason for it to take up the whole a whole SPU.

I don't recall hearing about huge explosion of CPU power required for audio decode algorithms. Audio decode was running just fine since the Pentium2 days even when clocks were still under 300mhz.

I think the only time people have talked about reserving a whole SPU for audio is when modleing 3D sound acoustics and mixing in hundreds of channels etc.


It's not a question of how much overkill a single SPE is for audio, but the nature of the CELL architechture. there isn't cache, only local store.


patsu said:
Cell should out-perform any GPPs in decoding AVC HP profile (probably a few folds). Not sure whether it's up to par with 40Mbps but at least there is no inherent restrictions that prevent Cell from performing its magic. Correct ?

Microsoft created VC-1, which I'm guessing is going to run great on CELL. At the end of the day it boils down VC-1 placed first in all of the codec shootouts, while h.264 AVC placed third. So as long as VC-1 runs up to 40 Mbps on the PS3, I think that is fantastic for Blu-Ray.



As far as the history of CELL goes, I am curious about any impact h.264 AVC had on the amount of SPE's and the decision to go with MPEG-2 with high bit rates.
 
It's not a question of how much overkill a single SPE is for audio, but the nature of the CELL architechture. there isn't cache, only local store.

... but for large, streaming audio/video data, caching is not effective. The Local Store approach is designed specifically to address "dynamic applications" like this. Data can be DMA'ed asychronously to the Local Store for very quick processing. There should be enough calculations to stagger/hide subsequent memory transfers to sustain the pipeline.

Microsoft created VC-1, which I'm guessing is going to run great on CELL. At the end of the day it boils down VC-1 placed first in all of the codec shootouts, while h.264 AVC placed third.

I don't know about that ranking in PS3 context. I have not seen Sony's VC-1 and AVC HP Profile *implementations* on Cell. They may perform equally well, or one may outperform the other (just because Sony spends more time on it) ? But yes, it would be great if both support the maximum Blu-ray bitrate.

As far as the history of CELL goes, I am curious about any impact h.264 AVC had on the amount of SPE's and the decision to go with MPEG-2 with high bit rates.

If there is no reason why AVC HP Profile performs badly on Cell, then the above question may be irrelevant. Decisions like this may be more related to business deals, royalties, preserving existing investments, positioning, ...
 
Last edited by a moderator:
Microsoft created VC-1
But it's based on MPEG-4 and shares algorithms with AVC (HP) and MPEG-2. BDA companies such as Sony and Matsushita can get a slice of the VC-1 license fee too.

http://www.mpegla.com/news/n_06-08-17_pr.pdf
MPEG LA Announces VC-1 License Terms
License Agreement Expected to Issue During 2006

(Denver, Colorado, US – 17 August 2006) – MPEG LA, LLC announced today that an initial group of essential patent holders have agreed on final terms of license to be included in the VC-1 Patent Portfolio License (“Licenseâ€), expected to issue during 2006. A summary of the license terms is attached.

“This represents an extraordinary and persistent effort by a devoted group of patent owners acting in the public interest,†said MPEG LA Chief Executive Officer Larry Horn. “By the patent owners’ commitment to make available a joint patent license for the convenience of VC-1 adopters, consumers benefiting from a marketplace of competitive technology choices are the clear winners.â€

The VC-1 essential patent holders currently include DAEWOO Electronics Corporation, France Télécom, société anonyme, Fujitsu Limited, Koninklijke Philips Electronics N.V., LG Electronics Inc., Matsushita Electric Industrial Co., Ltd. (Panasonic), Mitsubishi Electric Corporation, Microsoft Corporation, Nippon Telegraph and Telephone Corporation (NTT), Samsung Electronics Co., Ltd., Sharp Corporation, Sony Corporation, Telenor ASA, Toshiba Corporation, and Victor Company of Japan, Limited (JVC).
 
It's not a question of how much overkill a single SPE is for audio, but the nature of the CELL architechture. there isn't cache, only local store.
Yes, so what?

That's still no reason as to why an entire SPU would have to be set aside for audio.
 
It's not a question of how much overkill a single SPE is for audio, but the nature of the CELL architechture. there isn't cache, only local store.

Microsoft created VC-1, which I'm guessing is going to run great on CELL. At the end of the day it boils down VC-1 placed first in all of the codec shootouts, while h.264 AVC placed third. So as long as VC-1 runs up to 40 Mbps on the PS3, I think that is fantastic for Blu-Ray.

As far as the history of CELL goes, I am curious about any impact h.264 AVC had on the amount of SPE's and the decision to go with MPEG-2 with high bit rates.

Toshiba's interest into the Sony-IBM-Toshiba partnership is entirely based on putting Cell into HDTV sets to decode multiple MPEG4 streams. Toshiba has announced that it will incorporate Cell into all it's HDTVs. The reason for Toshiba using Cell in a HDTV is to decode five or six MPEG streams simultaneously, scaling and compositing them and displaying them simultaneously on a single screen as a main screen with say four or five live small screen which you can watch and switch to.

see http://www.hotchips.org/archives/hc17/2_Mon/HC17.S1/HC17.S1T3.pdf#search="toshiba Cell SCC"

Also Toshiba has demonstrated Cell decoding 48 MPEG2 streams simultaneously.

The suggestion that Cell might not be suited to decoding MPEG4 because of design limitations and that it might have trouble handling the 40Mbps hit rate of BD compared to half that on HD-DVD seem a little ridiculous in this context.

The low maximum bit rate of HD-DVD certainly would have prevented studios using MPEG2 for HD-DVD - which is why movie studios might have licensed VC-1 for HD-DVD and MPEG2 for BD now, even though they intend to shift to MPEG4 (h.264) in the long run.
 
Toshiba's interest into the Sony-IBM-Toshiba partnership is entirely based on putting Cell into HDTV sets to decode multiple MPEG4 streams.
I don't think that's true at all. The player or source decompresses the optical disc or satleitte/cable data and provides uncompressed data to the TV. If TVs were to receive compressed signals, we wouldn't need HDMI at all as a 10 year old tech like FireWire is fast enough to carry 40 MBps.

The reason for Cell in TVs is to process the raw image data with various image enhancement and scaling methods. You may get Cell doing decompression for TVs with built-in receivers, whether for digital radio transmissions or cable, but they won't be needing to deal with any more data than the service feeding them - certainly not multiple HD streams because the source isn't that fast. In fact I don't think 40 MBps 1080p @ 60 fps is even possible as a tranmission format yet as neither airwaves nor cables are fast enough to carry it at the moment, except of course in Korea :p
 
In fact I don't think 40 MBps 1080p @ 60 fps is even possible as a tranmission format yet as neither airwaves nor cables are fast enough to carry it at the moment, except of course in Korea

I think the 2.5Gbit broadband in Paris might be able to do it ;-)
At that speed it might even be able to do it uncompressed!
 
My primary interest in this thread is to learn how to apply the Cell architecture to real-world problem.

Seriously, it seems that CABAC is NUMA friendly (even with its 399 contexts). So Cell should enjoy an order of magnitude advantage in memory access (due to Local Store). It also has enough math power to zip through the computations.

As mentioned by ADEX eariler, if we run each slice (starting at the sync point) in parallel (using 2 or more SPEs), that's another multiplier in speed up on top of the very quick Cell-CABAC stage.

The other stages are also easily parallelizable according to the parallel CABAC paper.

I wouldn't worry too much about 40Mbps bitrate CABAC (5 times workload above 8Mbps, assuming linear complexity)... unless someone has specific details about the problem.

EDIT:
What is the average bitrate for a BR movie ? 15 Mbps ?
 
Last edited by a moderator:
I don't think that's true at all. The player or source decompresses the optical disc or satleitte/cable data and provides uncompressed data to the TV. If TVs were to receive compressed signals, we wouldn't need HDMI at all as a 10 year old tech like FireWire is fast enough to carry 40 MBps.

The reason for Cell in TVs is to process the raw image data with various image enhancement and scaling methods. You may get Cell doing decompression for TVs with built-in receivers, whether for digital radio transmissions or cable, but they won't be needing to deal with any more data than the service feeding them - certainly not multiple HD streams because the source isn't that fast. In fact I don't think 40 MBps 1080p @ 60 fps is even possible as a tranmission format yet as neither airwaves nor cables are fast enough to carry it at the moment, except of course in Korea :p

A cable TV cable carries a lot of bandwidth (by using multiple RF carriers on the same cable)

http://www.cs.bris.ac.uk/~janko/city/DBT_05_DVB-C_DVB-S_DVB-T.pdf#search="cable TV bandwidth"
http://www.cabledigitalnews.com/cmic/cmic1.html

Each standard television channel occupies 6 MHz of RF spectrum. Thus a traditional cable system with 400 MHz of downstream bandwidth can carry the equivalent of 60 analog TV channels and a modern HFC system with 700 MHz of downstream bandwidth has the capacity for some 110 channels.

MPEG2 = 6Mbps
110 x MPEG2 = 660Mbps = 82.5MBps
 
Which is why we aren't technically in a position for 40 Mbps transmissions, because you'd only get a handful of channels! TV's aren't going to need to be able to decode 40 Mbps h.264 video streams until they're being supplied that data. They're definitely not going to need to be able to decompress multiple HD streams because we're miles away from being able to supply multiple HD streams. And besides which , the cable boxes or HD players have processors for decompression so TVs don't have to worry about that job. The day we do get 40 Mbps h.264 over cable, the cable box will still be providing uncompressed signals to the TV over HMDI (unless there's a whole new format by then, which might be the case as I don't expect HD transmission to go beyond nearer 20 Mbps given the savings in BW can be used for more channels and that's all the people who design these systems care about - he says after last night watching some abysmal NTL cable TV with disfiguring heads thanks to the joy of digital and thanking Blighty that he still sticks to analogue despite apparent measures by the powers that be to decrease analogue's quality to promote digital as a better platform)
 
The suggestion that Cell might not be suited to decoding MPEG4 because of design limitations and that it might have trouble handling the 40Mbps hit rate of BD compared to half that on HD-DVD seem a little ridiculous in this context.

~2/3s. Max video bitrate on HD DVD is ~30 mbps, not 20 mbps. And PCs (even multicore) today still have problems doing H264@HP at anything above 20 mbps, and that's with GPU assist, huge honking 2MB caches, branch friendlier CPUs, OOOe, and SIMD.

I'm sure that CELL can handle it, but I wouldn't be suprised if careful optimization was needed.
 
Last edited by a moderator:
aaaaa00, do you happen to know how GPU can help here ? (unless they have custom logic).

If CABAC is sequential... it seems that using GPU may not be an efficient solution without special circuitry. There may be lot's of wasted cycles while waiting for result unless the devs can unroll and reorder the math somehow. But if the latter scenario is possible, it means that Cell can use the same advantage.

Also I'm not sure having a 2Mb cache will help here since we only zip through the video once. The contexts don't take up that much space.

If it's just computational branches, Cell has predication (i.e., calculate both and pick a result) that can avoid branch overhead.

Finally as for OoO execution, there should be enough computation for the devs to hide the latency (and sustain the throughput). The SPEs are dual-issue... one path for regular math work, and the other parallel path for memory loads, branch and bit-shifts (If I remember correctly). Like what you mentioned, it will be more work for the Cell programmer to optimize the code... but it looks doable.

Besides the Local Store advantage, there are also hardware communication primitives (like fence and barriers) to help multiple SPEs work efficiently together on separate slices concurrently. But agree with you it will take lot's of optimization and testing to get everything to work fast.
 
aaaaa00, do you happen to know how GPU can help here ? (unless they have custom logic).

I don't have any idea how RSX can help, since I don't know anything about RSX. Presuming RSX has the type of acceleration PC GPUs do, I presume it can help in the same fashion as PC GPUs.

Also I'm not sure having a 2Mb cache will help here since we only zip through the video once. The contexts don't take up that much space.

In general big honking caches can't hurt, everything else being equal.

And with a big honking cache, you can prefetch a lot more, which ends up having roughly the same effect as having SPU local store anyway.

If it's just computational branches, Cell has predication (i.e., calculate both and pick a result) that can avoid branch overhead.

Finally as for OoO execution, there should be enough computation for the devs to hide the latency (and sustain the throughput). The SPEs are dual-issue... one path for regular math work, and the other parallel path for memory loads, branch and bit-shifts (If I remember correctly). Like what you mentioned, it will be more work for the Cell programmer to optimize the code... but it looks doable.

Besides the Local Store advantage, there are also hardware communication primitives (like fence and barriers) to help multiple SPEs work efficiently together on separate slices concurrently. But agree with you it will take lot's of optimization and testing to get everything to work fast.

Do you know any of this for sure, or are you just speculating and throwing around buzzwords? ;)

Arguing over specs and papers isn't worth 1/10th of actually going and trying it, so unless someone who's actually writing it for PS3 shows up and starts talking, this thread is pretty much useless.

The only thing is, I can tell you what I found out from building and running libavcodec on my PC is that CABAC is a non-trivial component of H.264 HP decode, that H.264 HP decode in general is computation, branch, and memory access heavy, and based on what I see there, in my opinion, it will take some significant work to optimize on PS3.

But of course I could very well be wrong, since I'm not actually implementing the PS3 decoder.
 
Last edited by a moderator:
I don't have any idea how RSX can help, since I don't know anything about RSX. Presuming RSX has the type of acceleration PC GPUs do, I presume it can help in the same fashion as PC GPUs.

I was talking about "And PCs (even multicore) today still have problems doing H264@HP at anything above 20 mbps, and that's with GPU assist, huge honking 2MB caches, branch friendlier CPUs, OOOe, and SIMD". The PC GPUs you mentioned here, were you refering to say nVidia's PureVideo part or something else ? Is it confirmed that PureVideo or Avivo cannot handle 20+Mbps AVC HP stream for example.

In general big honking caches can't hurt, everything else being equal.

And with a big honking cache, you can prefetch a lot more, which ends up having roughly the same effect as having SPU local store anyway.

Certainly, but the outcome may be different. Local Store performance is in the range of L1 cache. The 2Mb cache is most likely L2 for common PCs. Also when you prefetch, how much memory can you move and how fast ? Is it asynchronous ? I'm asking just to explore the differences in memory architecture more.

Do you know any of this for sure, or are you just speculating and throwing around buzzwords? ;)

Arguing over specs and papers isn't worth 1/10th of actually going and trying it, so unless someone who's actually writing it for PS3 shows up and starts talking, this thread is pretty much useless.

Well in this thread, I learn more about what Cell has and has not... even if they are buzzwords. It helps to understand/confirm my understanding of the differences between Cell and GPPs.

The only thing is, I can tell you what I found out from building and running libavcodec on my PC is that CABAC is a non-trivial component of H.264 HP decode, that H.264 HP decode in general is computation, branch, and memory access heavy, and based on what I see there, in my opinion, it will take some significant work to optimize on PS3.

But of course I could very well be wrong, since I'm not actually implementing the PS3 decoder.

Sure. Like I said, I'm just trying to understand the Cell architecture more with your help ;-)
Cell will be "hard to program" either way.
 
There sure have to be sync-points, so you can jump to frame X without running through ALL preceding frames. As soon as you have such sync-points you can trivially start processing from multiple sync-points,
By "sync-points" I guess you mean "I-frames". These might be every 0.5~1 seconds or perhaps even further apart. If you want to try that with HD data then you'd better have a lot of RAM.

CABAC is most expensive in the encode, not the decode.
I would have thought it was the other way around. With encode you know the contexts in advance since you are doing the "binarisation" (i.e. VLC encoding) first which is then encoded with the arithmetic encoder on a bit-by-bit basis. When decoding, OTOH, you (usually) have to decode each bit in order to decide how to to do the next one.
 
Last edited by a moderator:
Back
Top