Will the PS3 be able to decode H264.AVC at 40 Mps?

Will the PS3 decode H.264/AVC at 40 Mps?

  • Yes, the PS3 will decode H.264/AVC at 40 Mps

    Votes: 86 86.9%
  • No, the PS3 won't decode H.264/AVC at 40 Mps

    Votes: 13 13.1%

  • Total voters
    99
I had a look at a couple of the links and also had a quick look over the code. I can see why you think it'll be Cell unfriendly as the CABAC section is jammed with branches everywhere.

A lot of SPE optimisation is based on branch removal using techniques such as predication where you compute both sides of a branch then "select" the answer, this might be possible.

Also, the data in question appears to be very small so a single SPE will have no problem holding it in a local store, it might even be possible to hold it in the register file!

It does look like a pretty serious bottleneck but it doesn't look like a worst case scenario for Cell.
 
That reads like a typo

CABAC would run badly on Cell given that it can't be parallelized

or

CABAC wouldn't run badly on Cell given that it can be parallelized

As written it's nonsensical, and I guess the former was the point being made, that the method can't be parallized and would be limited to one core, probably PPE, which would be maybe no more effective than a standard CPU.

Shifty, I made a (stupid) mistake in interpreting the first Cabac article. In any case the PPE approach is a fall back, there may be a better way to do it in the SPEs.

The "High Speed Decoding of Context-based Adaptive Binary Arithmetic Codes Using Most Probable Symbol Prediction" paper mentions that:

(1) More than 90 percents of the H.264 main profile stream is encoded using the CABAC (Will assume HP profile has similar characteristics)

(2) Decoding algorithm is basically sequential and needs large computation to calculate range, offset and context variables

(3) Other stages of AVC (HP) decoding can be parallelized and pipelined relatively easily (Not sure how large a percentage of work these constitute).

To address (2), the proposed system processes 2 symbols at a time, and performs speculative calculation of the second symbol assuming:
(i) The next context is used, and
(ii) Decoded bin is MPS
to achieve 24% speed up (empirically)

For Cell, a dev may...

* Use SPE's raw computation power, predication and local store to speed up the "basic" calculations.

* Use 2 SPEs to parallelize/stagger the calculation of consecutive symbols (as in the paper).

* Instead of assuming "the next context is used", reorganizes and fetches the contexts in the form of SOA into the local stores (if possible) e.g., Fetch the same binary keys from all 399 contexts and perform SIMD calculations on them all at once locally. If this is possible, then a correct MPS guess would guarantee the right result.

* In addition, a 3rd SPE may be used to perform speculative calculation assuming that "Decoded bin is LPS". Again perform SIMD calculations on the right entry in all 399 contexts (if possible).

Need more info to know whether this is doable. We should be able to hit 24% to 40+% speed up, and also free up PPE (hopefully) for other stuff ?
 
Last edited by a moderator:
If a high bit rate .h264 AVC movie can't run on the PS3, then movies studios will never release a movie encoded at that bit rate. Sony is going to be selling 10 of millions of PS3's.
Which leads back to two points already raised - 1) Will any studios be encoding high bitrate .h264? and 2) Does not the BRD specification require that PS3 can decode h.264 at 40 Mbps to comply as a BRD player? At wich point, either it's not a complete BRD player and Sony are evading their own standard, or we can trust that it can deal with codec because it is a BRD player and that's a minimum requirement!
 
Be careful, this is pure pr talk ,the result is not equal to something with an ideal hi bitrate situation , that's just in my opinion.

It normally would be but it was "CJplay" who is the Warner compressionist working on the title who confirmed the ABR on Batman Begins. From all that's been spoken about the film, it'll be a showcase title from a PQ perspective. We'll see soon enough! I do agree that more bitrates are good but you can get to a point of diminishing returns, depending on how good the encoder is.
 
Last edited by a moderator:
The ability of CELL to decode in the PS3 is important. It won't matter what standalone players decode, because the standard is only as strong as it's weakest link. If a high bit rate .h264 AVC movie can't run on the PS3, then movies studios will never release a movie encoded at that bit rate. Sony is going to be selling 10 of millions of PS3's.

That's my point and also why i think there is no problem. In the end, if Cell has issues with CABAC that can't be fixed via software upgrades (if there is a problem at all!), there will just be NO movies using H264 at 40MBps. Simple as that. :D
 
It normally would be but it was "CJplay" who is the Warner compressionist working on the title who confirmed the ABR on Batman Begins. From all that's been spoken about the film, it'll be a showcase title from a PQ perspective. We'll see soon enough! I do agree that more bitrates are good but you can get to a point of diminishing returns, depending on how good the encoder is.

If you look at the reviews and the posted bitrates, the bitrates does seem to follow the quality on the current HD-DVD. @13 mbit average i will be impressed :)
 
...
(1) More than 90 percents of the H.264 main profile stream is encoded using the CABAC (Will assume HP profile has similar characteristics)

(2) Decoding algorithm is basically sequential and needs large computation to calculate range, offset and context variables

(3) Other stages of AVC (HP) decoding can be parallelized and pipelined relatively easily (Not sure how large a percentage of work these constitute).
...
There sure have to be sync-points, so you can jump to frame X without running through ALL preceding frames. As soon as you have such sync-points you can trivially start processing from multiple sync-points, the cost is more used memory for buffering and bandwith. Say you have a sync-point atleast every 3 frames, then you could read 6 frames and kick off 2 SPE to start decoding CABAC from frame n and n+3, but you need to provide enough buffers to hold compressed/uncomp data.
Another SPE would then take the unCABACed data and decode it further.
 
Wow, 6 pages... I can't believe this thread made it this far... :(

Brimstone said:
See Blu-Ray offers a higher data rate than HD-DVD. A HD-DVD player will never be able to decode H.264 at 40 Mps because its max video data rate is 28.00 Mps compared to 40.00 Mps for Blu-Ray.

The higher allowable bit-rate is mainly to accommodate high bit-rate MPEG2. Even though BD-ROM supports AVC High Profile 4.1 I doubt many discs will be authored with a video bitrate higher than what's permitted in 4.0 (25mbps). 4.1 has bit-rates which exceed the video specification for BD-ROM so it's less likely that they'll be used.

So do you think the PS3 will be able to decode H.264/AVC at Blu-Rays peak rate?

Even early devkits can handle multiple streams, there's no reason why a PS3 wouldn't (outside of some catastrophic bugs).

aaaaa00 said:
Huh?

DD+ is not mandatory on Blu-Ray.

Doesn't really matter. The current BD-ROM spec requires a compliant device to support discs with DD+ (along with DD, Dolby Lossless, DTS, DTS-HD, and multi-channel LPCM.)

fafalada said:
Rumours also said that bilinear filtering cut PS2 fillrate by factor of 4 and god knows what else.

Yes, and those same rumors said that 16 pixel pipelines were too much to be efficiently used (compared to the more common and "normal" 4 pipe setups of the time), unless you drew *large* triangles... ;)

At any rate, if my memory serves me correct a certain member of this forum (he'll correct me if I remember wrong) got 1080i H.264 running on PS2 without frame drops. If PS3 can't do this I would be worried about a lot more then its video capabilities.

Yes, you need to be corrected... :p It wasn't 1080i H.264, it was 1080i MPEG2 (technically 1088i, but who's counting)... Actually I was able to decode to just a hair under 48fps (frames not fields) before the IPU ran out of steam. Of course there are CRTC clock/sync issues that negate that as well. Also there was the multi-path XviD decoder, but that was only to 704x512...

Shifty Geezer said:
I'm pretty sure that's not true. AVC is from MS, h.264 is from the MPEG group. They may use the same general compression techniques (or not) but they're different formats.

No, AVC is the MPEG group's designation for MPEG-4 Part 10 (aka H.264). H.264 is the ITU designation. VC-1 is the informal name for SMPTE-241M...

Brimstone said:
The Sony mantra that MPEG-2 at high bit rates is best suited for HD movies for Blu-Ray is a giant PR insurance policy for CELL.

1.) That only deals with movies released by Sony Pictures, and studios that use the MPEG-2 only versions of authoring tools. Ergo, it doesn't prevent other studios from authoring AVC encoded material (e.g. Panasonic's tools).

2.) MPEG-2 is simply being used as it's far more mature and the authoring tools around it are more optimized for rendering out MPEG-2 content. Ergo, it doesn't prevent Sony from continuing to develop their authoring tools around AVC (in fact it *is* being done).

3.) MPEG-2 based content isn't constrained by the immaturity of VC-1 and AVC hardware encoders (e.g. players that sacrifice features like single-frame advance, slow-mo, single-frame reverse, slow-mo-reverse, or just simply clean scanning forwards and backwards.

As fate would have it, Microsoft was thinking big, as in Digital Cinema extreme big. They ditched their previous work and started building a codec from the ground up becaused they discovered what worked great for lower resolutions, hurt image quality as you cranked up the resolution. So Microsoft was targeting high resolutions and optimizing for it. They stayed away from CABAC because of the heavy proecessing requirments.

MS didn't ditch their previous work. And I don't know why people are making such a big stink over CABAC... It's only represents like 3% of an encode cycle (and is even less significant on a decode cycle). If you're gonna bitch about processing requirements, then bitch about how many cycles are eaten up on motion vector search...

-tkf- said:
Ohh absolutely, but given that VC-1 is very mature i don´t expect more miracles

By codec standards it's still quite immature...
 
Last edited by a moderator:
It normally would be

mmm...normally it don't , at least for me ;)

but it was "CJplay" who is the Warner compressionist working on the title who confirmed the ABR on Batman Begins.
From all that's been spoken about the film, it'll be a showcase title from a PQ perspective.

Do you really expect him to say something different ?

Clearly he is going to say that his movie will be best , we always do that :p
 
One of the big features of the new interactivity standards is that menus don't have to pause the movie. Both interactivity standards allow for menus, animations, videos, and other objects that float over the movie, as well as picture-in-picture, all composited and rendered at 1080p.

A Java interactivity layer isn't going to eat up much CPU at all unless it's running a video game. Interactivity layers are handled by event driven programming wherein the Java VM is asleep most of the time waiting for a signal to wake up and handle an event (e.g. user clicked on something) All of the media oriented stuff (pip, floating videos, animations) is handled by native code, not Java. The Java code simply sets up a request, and hands it off to the underlying BD-J runtime, then goes back to sleep waiting for I/O completion.

With regards to the context model needed for CABAC, it only is a few hundred entries and easily fits within SPE local memory from what I've read.. The problem with CABAC isn't updating the model, and there isn't so much branching, as lots of shifts and table lookups. CABAC is most expensive in the encode, not the decode. CABAC isn't that parallizable compared to huffman tho, but I highly doubt that decode scales worse than n^2 and my intuition says most like O(N) or O(n log n), after all, why should 2N bits take 4N as long when the decoder at its heart, doesn't perform doubly nested loops over the entire bitstream. The decoding context at each bit is a tiny fraction of the future bitstream that hasn't been read yet.

This means that decoding 40mbps shouldn't be much more than a factor of 4 worse than 10mbps atleast as far as CABAC is concerned. Now, the rest of the AVC pipeline might be a different story.
 
There sure have to be sync-points, so you can jump to frame X without running through ALL preceding frames. As soon as you have such sync-points you can trivially start processing from multiple sync-points, the cost is more used memory for buffering and bandwith. Say you have a sync-point atleast every 3 frames, then you could read 6 frames and kick off 2 SPE to start decoding CABAC from frame n and n+3, but you need to provide enough buffers to hold compressed/uncomp data.
Another SPE would then take the unCABACed data and decode it further.

Finally. Thanks for the info. If sync-points (slices ?) are frequent enough, then parallelization should be a non-issue.

In a nutshell, H.264 HP Profile decoding performs well on Cell (without using PPE even). I'm happy again. ;-)
 
And I don't know why people are making such a big stink over CABAC... It's only represents like 3% of an encode cycle (and is even less significant on a decode cycle). If you're gonna bitch about processing requirements, then bitch about how many cycles are eaten up on motion vector search...

That's odd. Your comment piqued my interest so I grabbed the sources for ffdshow off sourceforge and ran some profiling passes.

I used Media Player Classic 6.4.9.0, and built my own libavcodec from an Aug 3rd snapshot of the ffdshow_tryout branch. This codec incorporates the x264 open source implementation of H.264 and supports CABAC and HP (and is widely considered a very good software implementation on the PC).

For a ~8.2 mbit/s 1920x1080p encoding of a Pirates of the Caribbean 2 trailer, on this particular machine (a dual-core AMD64 2.4 ghz with 2GB of RAM):

~85% was in the decode_slice fuction.
~14% was consumed by mplayerc's post processing and DirectShow.
~1% consumed by the OS (Windows Vista).

Of that 85% spent in decode_slice:

~40% was spent decoding the CABAC (decode_mb_cabac)

~60% was spent on macroblocks (hl_decode_mb)
intraslice prediction
motion prediction (hl_motion)
filtering (filter_mb)
and everything else.

So it seems to me that CABAC is not anywhere close to just 3% of CPU load, unless you can tell me where my profiling run went wrong. :)

Thanks.
 
Last edited by a moderator:
That's odd. Your comment piqued my interest so I grabbed the sources for ffdshow off sourceforge and ran some profiling passes.

I used Media Player Classic 6.4.9.0, and built my own libavcodec from an Aug 3rd snapshot of the ffdshow_tryout branch. This codec incorporates the x264 open source implementation of H.264 and supports CABAC and HP (and is widely considered a very good software implementation on the PC).

For a ~8.2 mbit/s 1920x1080p encoding of a Pirates of the Caribbean 2 trailer, on this particular machine (a dual-core AMD64 2.4 ghz with 2GB of RAM):

~85% was in the decode_slice fuction.
~14% was consumed by mplayerc's post processing and DirectShow.
~1% consumed by the OS (Windows Vista).

Of that 85% spent in decode_slice:

~40% was spent decoding the CABAC (decode_mb_cabac)

~60% was spent on macroblocks (hl_decode_mb)
intraslice prediction
motion prediction (hl_motion)
filtering (filter_mb)
and everything else.

So it seems to me that CABAC is not anywhere close to just 3% of CPU load, unless you can tell me where my profiling run went wrong. :)

Thanks.

Try CoreAVC. I can perfectly play a 720p H.264 stream with a pentium-m set to 800 mhz.
 
Try CoreAVC. I can perfectly play a 720p H.264 stream with a pentium-m set to 800 mhz.

What bitrate? CABAC is proportional to bitrate, so a 40 mbps stream will consume 8 times the CPU of a 5 mbps stream just for CABAC, assuming your memory and cache can keep up.

Anyway, it appears that CoreAVC is closed source, so I can't exactly profile it and get any useful information.
 
Last edited by a moderator:
What bitrate? CABAC is proportional to bitrate, so a 40 mbps stream will consume 8 times the CPU of a 5 mbps stream just for CABAC, assuming your memory and cache can keep up.

Anyway, it appears that CoreAVC is closed source, so I can't exactly profile it and get any useful information.

I don't know what bitrate it is. I tested it playing the quicktime hd trailers from http://www.apple.com/trailers/ . They have a high bitrate, but i don't know which, as i don't know how to look it up.
 
I don't know what bitrate it is. I tested it playing the quicktime hd trailers from http://www.apple.com/trailers/ . They have a high bitrate, but i don't know which, as i don't know how to look it up.

Quicktime is slow and doens't fully support Main Profile, so I think Apple doesn't turn all H.264 features on for their encodes, otherwise Quicktime users would have problems playing them.
 
Back
Top