PDA

View Full Version : Avivo Transcoding....


bigz
02-Nov-2005, 21:52
http://www.extremetech.com/article2/0,1697,1880749,00.asp

This looks interesting... reduce your transcoding times to 1/5th of the time. :shock:

Mintmaster
02-Nov-2005, 22:21
Quite impressive.

How much quality to you lose with transcoding? Is it quite noticeable?

archie4oz
02-Nov-2005, 23:03
Also can't scale, frame-rate convert, crop, or much of anything else yet...

Bouncing Zabaglione Bros.
02-Nov-2005, 23:15
Quite impressive.

How much quality to you lose with transcoding? Is it quite noticeable?

Depends on the codecs used and how you have them set. This is just doing the same calculations as you would when transcoding on the CPU, with the same levels of quality - just a lot faster because it's on the GPU. There should be no difference in quality when running the calculations on the GPU as opposed to the CPU.

Bouncing Zabaglione Bros.
02-Nov-2005, 23:17
Also can't scale, frame-rate convert, crop, or much of anything else yet...
It's only an alpha tech demo, but it could be finished up to be quite a nice product. Companies like Divx or Nero could intergrate this technique and see massive speedups in their transcoding performance.

IgnorancePersonified
02-Nov-2005, 23:34
Shiezer! :shock: (If I spelt that right)

Geo
03-Nov-2005, 02:41
Yeah, pretty damn sweet. We were talking about it in the "Cheng on Avivo" thread in Industry.

X1800XT seems like a more "balanced" part than G70. The "overall performance" tiara isn't the only one on the table. IQ and video processing (tho it is a bit early on the latter, but I'm liking where it is going so far) seems to go to X1800XT. It depends on what you really want and just how important the performance delta is for your given situation.

I'm not in the market at all right now, but if I was I'd probably go X1800XT.

Which is not to minimize (if the rumors are correct) the 512mb version of GTX. It has been three years since NV had the "overall performance" tiara in a flat-out unquestionable way, and they deserve acknowledgement that it appears they are about to do it again. In my mind (again I'm accepting the rumors), the 512mb GTX finally buries the last vestiges of the NV30 debacle, from a "product" perspective (yes, I understand that some will never "forgive" other elements) after three years.

But I still like X1800 a bit better. YMMV.

DemoCoder
03-Nov-2005, 03:16
Yeah, but what about encoding? Transcoding from MPEG-2 is no help if you're trying to produce video and your assets aren't mpeg-2 already.

Geo
03-Nov-2005, 03:39
Yeah, but what about encoding? Transcoding from MPEG-2 is no help if you're trying to produce video and your assets aren't mpeg-2 already.

Well, you're Mr. Video Slut --you tell us. Is there a substantive difference in the hardware assets necessary to do both? I've been assuming that if they could transcode like mad that the encode wouldn't be much different. Not necessarily true?

DemoCoder
03-Nov-2005, 04:52
It depends. There are two ways to do transcoding.

1) decompress MPEG-2 and then recompress with H.264
2) utilize part of the work already done by MPEG-2 and reformat or re-run some of the stages of the H.264 codec, but not all.


The problem is, ExtremeTech's test is not fair. They are comparing #1 to #2 (and also comparing Divx to H.264). There are software MPEG-2 to MPEG-4 transcoders on the market. The proper test to determine whether or not any significant hardware acceleration exists would be to compare a pure software transcoder to ATI's (as well as comparing quality). The other problem is, MPEG-2 to MPEG-4 transcoding algorithms differ in technique, so even then it would be a apples-to-oranges comparison, but atleast more accurate than comparing MPEG-2 decompression followed by Divx to H.264 transcoding.

To answer your question: the hardware neccessary to accelerate transcoding is not exactly the same as the hardware neccessary to accelerate encoding. Transcoding is a simplified problem, and some implementations of it are little more than reformating the syntax of the bitstream and dropping frames or resolution.

Geo
03-Nov-2005, 05:11
Interesting. Thanks. I would imagine if they could leverage work already done without quality impact, they would. At least from an efficiency point of view and getting it done fast, why wouldn't you? So likely #2 would make sense even if they had the hardware to do #1 efficiently. Doing more work than necessary is rarely going to be faster.

Would it be true that there is a substantial overlap on the transistor budget for doing both? Any guesses on what your extra percentage would be for adding encode transistors that don't help with transcode? I understand we're being highly theoretical here, and within a month or two we will flat out know, but this is the kind of thing we do here, so wotthehell. :smile: So, say, if 100% gives you both fully accelerated encode and fully accelerate transcode, and A% is tranny budget for transcode only, what is B% (where A + B =100) to add fully accelerated encode?

Bouncing Zabaglione Bros.
03-Nov-2005, 10:36
Yeah, but what about encoding? Transcoding from MPEG-2 is no help if you're trying to produce video and your assets aren't mpeg-2 already.
I suppose it depends on whether they are using the MPEG hardware on the chip, or if they are running calculations in a GPGPU (?) fashion. There's no place in the article where it says this only works for MPEG2, and it would be rather limited if that were the case. The fact that they can encode to other formats makes me think they can easily do the necessary calculations on the GPU, so there's no reason they can't do the input calculations too.

Just as other apps like VirtualDub, or DrDivx can accept inputs of many different filetypes, I would expect this new app (or other apps that use this technique) to accept any and all filetypes as input. Wouldn't be much use otherwise.

Rys
03-Nov-2005, 11:21
I don't think it can encode from raw fields to H.264. The Theater presents MPEG-2 to the GPU over the VIP. So unless the GPU can work with raw data, and the presenting video capture device just sends it in with no pre-processing.....

Don't think that's the case, and certainly won't be with the non-AIW VIVO models at the very least. The hardware combo doesn't work the way it needs to for that.

Simon F
03-Nov-2005, 12:27
I don't think it can encode from raw fields to H.264.
That would be a miracle. H264 is fiendishly complicated

_xxx_
03-Nov-2005, 13:03
It's only an alpha tech demo, but it could be finished up to be quite a nice product. Companies like Divx or Nero could intergrate this technique and see massive speedups in their transcoding performance.

I hope Nero will, the transcoder in their package just plain sucks although it's the best burning SW out there.

Bouncing Zabaglione Bros.
03-Nov-2005, 13:07
That would be a miracle. H264 is fiendishly complicated

There are already H264 encode algorithms, and even free libraries to do it. Why wouldn't ATI or Nvidia simply run that code on their GPUs instead of the CPU as happens now?

This is not just about transcoding video streams, this is about using the GPU as a specialised processor, for tasks like video transcoding, physics calculations, etc..

Gubbi
03-Nov-2005, 13:19
To answer your question: the hardware neccessary to accelerate transcoding is not exactly the same as the hardware neccessary to accelerate encoding. Transcoding is a simplified problem, and some implementations of it are little more than reformating the syntax of the bitstream and dropping frames or resolution.

Absolutely right, also when transcoding from one codec to the same codec all kinds of shortcuts can be made.

I know that Nero Recode only recodes I-frames in mpeg-2 streams, there'd be no point in recoding P and B frames. NR is pretty fscking fast, <10 minutes for an entire DVD, so I don't know how much benefit hardware assist would give here..

Originally they decoded the video stream and then re-encoded it to mpeg-2 again. Back then transcoding time would be a decimal order of magnitude slower (It could make a better result, because the encoder could make different choices regarding I/P/B frame mix and amount of bits per I-frame). This is exactly the same "penalty" as transcoding between two codecs. This could probably see a significant speed up.

Edit: Regarding NR, not only does it only recode I-frames, it only does re-quantization of the data, so it's just symbol-decode->re-quantization -> symbol-encode, so no DCT/iDCT involved.

Cheers
Gubbi

Simon F
03-Nov-2005, 13:26
There are already H264 encode algorithms, and even free libraries to do it.
I'm well aware that code exists to do it - the standard comes with a reference encoder/decoder.
Why wouldn't ATI or Nvidia simply run that code on their GPUs instead of the CPU as happens now?
Same reason that you don't currently run your word processor on the GPU. You can't just go and take any old C/C++ code, recompile and run it.
This is not just about transcoding video streams, this is about using the GPU as a specialised processor, for tasks like video transcoding, physics calculations, etc..
For some aspects of the video encode, the GPU would be great (motion vector searching would be a possibility) but other parts of the process would be too painful to even bear thinking about.:sad:

nobody
03-Nov-2005, 13:27
Depends on the codecs used and how you have them set. This is just doing the same calculations as you would when transcoding on the CPU, with the same levels of quality - just a lot faster because it's on the GPU. There should be no difference in quality when running the calculations on the GPU as opposed to the CPU.


Do you also think that every 3D accellerator produces the same image quality?
If not, then you should NOW have recognized that you're talking bullshit.

Bouncing Zabaglione Bros.
03-Nov-2005, 13:35
Do you also think that every 3D accellerator produces the same image quality?
If not, then you should NOW have recognized that you're talking bullshit.

Do you think the image quality a chip produces when rendering games has anything to do with the results of using it as a specialised number cruncher? You think transcoding software running on CPUs produce the same image quality as every other?

It's one thing to take shortcuts in games when rendering an image, it's quite another to have any other result than 4 when adding 2+2.

<zorro>

I think it is you who have the shit of the bull in your face!

</zorro>

Simon F
03-Nov-2005, 15:23
Bouncing, with all due respect, I think you may be wrong in this. Some of the standards are very particular about what you can and can't do accurately, which means you can't just go and use the built-in hardware functions.

Bouncing Zabaglione Bros.
03-Nov-2005, 15:35
Bouncing, with all due respect, I think you may be wrong in this. Some of the standards are very particular about what you can and can't do accurately, which means you can't just go and use the built-in hardware functions.
If ATI can use the R5x0 hardware to do physics calculations, why can't they use it to do gerneral calculations, especially ones that are particularly suitable for parallelizing like transcoding? I think the hardware is flexible enough to do this.

If you read the article:

Currently the company is working on additional features and profiles, including profiles for the PSP, video iPod, and H.264 encoding.
So ATI obviously think they can do something useful here, and I doubt that will be by ignoring the codec spec.

Simon F
03-Nov-2005, 16:56
If ATI can use the R5x0 hardware to do physics calculations, why can't they use it to do gerneral calculations, especially ones that are particularly suitable for parallelizing like transcoding?

With physics, every man and his canine companion will expect there to be a fuzzy answer because
a) there are probably no exact solutions to the equations being solved and
b) they are used to using floats and so know there will be rounding errors.


IMHO, some things in video encoding can be assisted by graphics hardware nicely (eg looking for candidate motion vectors), but some other aspects would be utterly hideous to code in a DX shader.

Jawed
03-Nov-2005, 16:58
but some other aspects would be utterly hideous to code in a DX shader.
It would be great if you told us why...

Jawed

Gubbi
03-Nov-2005, 22:13
It would be great if you told us why...


De/encoding of the symbols in a MPEG (at least 1&2) stream is just about as sequentiel code as you will ever see (Huffman coding).

Cheers
Gubbi

Rys
04-Nov-2005, 00:05
Yeah, macroblock generation and run-length encoding (to name just two encode processes for H.264) pretty much suck in terms of being parallelisable. CABAC too, probably (although I'm not sure the GPU helps with that, at least not just now in the current transcoder).

IgnorancePersonified
04-Nov-2005, 02:14
Sounds good to me: most encoding I do is from HDTV which is dumped down as an mpeg file.

Simon F
04-Nov-2005, 08:16
De/encoding of the symbols in a MPEG (at least 1&2) stream is just about as sequentiel code as you will ever see (Huffman coding).

Cheers
Gubbi
And, from what I've read, H264 is even worse! (FWIW (some of?) the source code is on the web).

DemoCoder
04-Nov-2005, 09:22
http://cs.felk.cvut.cz/psc/event/2004/p14.html


Abstract:

We present a cost optimal parallel algorithm for the computation of arithmetic coding. We solve the problem in O(log n) time using n/log n processors on EREW PRAM. This leads to O(n) total cost.


:)

This one shows a more general technique, they handle huffman and arithmetic. I also found parallel versions of Lempel-Ziv.
http://rii.ricoh.com/~gormish/pdf/icip94_abs.pdf

And here is one on decompression of huffman in parallel
http://domino.research.ibm.com/comm/wwwr_seminar.nsf/pages/sem_abstract_256.html


BTW, I agree with Simon that this would be ugly and in general not good on the GPU.

Simon F
04-Nov-2005, 09:55
http://cs.felk.cvut.cz/psc/event/2004/p14.html
I've been experimenting at home with audio compression and was hoping to use arithmetic encoding so that's an interesting read. Unfortunately, they don't solve the decode problem, and that's the real "fun" part of arithmetic coding. :-)

AFAIU it's even more fun with H264 because you can't decode a symbol until you know what the previous symbol decoded to because they change the stats on a symbol by symbol basis. That's great for compression but pretty much forces the system to be sequential. I suspect that a quantum computer would solve the problem - now where did I leave mine :roll:


BTW, I agree with Simon that this would be ugly and in general not good on the GPU.
I'll take a look at the other links later.

Gubbi
04-Nov-2005, 10:32
It should be possible to decode each macroblock in parallel. Don't know how much that would buy you though.

Edit: I can see that DC's last link exploits that.

Cheers
Gubbi

Simon F
04-Nov-2005, 10:56
It should be possible to decode each macroblock in parallel.
Cheers
Gubbi
If you just meant doing IDCT, then yes. If you mean "decode the binary stream into symbols" then not in H264 you can't!

Gubbi
04-Nov-2005, 11:05
If you just meant doing IDCT, then yes. If you mean "decode the binary stream into symbols" then not in H264 you can't!

It must be possible (ok, I really don't know :) ), otherwise how do you recover from a transmission error ?

Cheers
Gubbi

Simon F
04-Nov-2005, 11:47
It must be possible (ok, I really don't know :) ), otherwise how do you recover from a transmission error ?
I believe there are places in the stream where you can resynchronise should something untoward happen.
Via the magic of Google (http://www.pixeltools.com/h264_paper.html):
H.264 includes several other features that are useful in containing the impact of errors, and in enabling the use of scalable or multiple bit streams:
• Slice coding. Each picture is subdivided into one or more slices. The slice is given increased importance in H.264 as the basic spatial segment that is independent from its neighbors. Thus, errors or missing data from one slice cannot propagate to any other slice within the picture. This also increases flexibility to extend picture types (I, P, B) down to the level of "slice types." Redundant slices are permitted.
I suppose you could decode these slices in parallel, but that wouldn't help very much if there was one in each frame. :razz:

Captain Chickenpants
04-Nov-2005, 11:50
It must be possible (ok, I really don't know :) ), otherwise how do you recover from a transmission error ?

Cheers
Gubbi

You can't fully decode macroblocks in parallel as if it is an Intra block then it potentially requires input from the decoded adjacent macroblock.

To limit the effect of a transmission error you can have the frame split into multiple slices (a group of macroblocks).

I see Simon has provided a link explaining exactly that
I was going to get some friends to proof read this first, but I guess you lot can do a reasonable job of that, and this seems an appropriate time.
I have put together a simple (ish) explaination of the basics of video encoding.
http://www.gelp.net/content/view/46/29/
Feel free to point out any mistakes.
CC

Gubbi
04-Nov-2005, 12:25
Sorry I mistook slices for macroblocks, ie. synch symbols is at the start of a slice not at the start of a macroblock.

Haven't messed with video since university :)

Cheers
Gubbi

Geo
04-Nov-2005, 13:39
Well, that's all very interesting. But wouldn't you typically capture in real time to an mpeg2 or some other already-processed format anyway?

Captain Chickenpants
04-Nov-2005, 13:52
Well, that's all very interesting. But wouldn't you typically capture in real time to an mpeg2 or some other already-processed format anyway?

Not sure who/what you are replying to?

What do you mean by an 'already processed' format?

Geo
04-Nov-2005, 13:55
Not sure who/what you are replying to?

What do you mean by an 'already processed' format?

Well, we got this started with DC noting he wanted to know how fast encoding was going to be, and I started wondering just when it is all that useful to be doing encoding from raw capture. . .and realized I couldn't think of any time except when you're doing the capture itself. For instance, my wife's AIW captures to mpeg. . .at which point transcode is the useful thing and encode from raw is uninteresting; it's already happened.

Edit: And to answer my own question. Capturing straight to h.264, so you can avoid the transcode entirely. Duh. :lol:

Captain Chickenpants
04-Nov-2005, 14:07
Ah gotcha.

When you say that your wife AIB captures to mpeg (for example) don't forget that something is having to do the encoding. For Video capture boards it is generally a custom chip, but if ATI are looking at using their GPU to do transcoding, then it is probably only a matter of time before they start looking at doing the full encode that way.
In that case the speed of encode will potentially translate to the quality of encode, as the faster you can do the encoding then the more time you have to find good prediction candidates and thus produce better quality for a given bit-rate.

suryad
04-Nov-2005, 18:57
Impressive stuff. But does that mean then that there could be potential uses for users of Avid HD and Adobe Premeier Pro and so on in rendering the raw footage to the final version through the GPU itself? Cause right now the whole thing is CPU intensive and it takes a while...even on the fastest procs.

a3dmaster
30-Nov-2005, 13:19
Great Job ATI! The guys from the german online-mag CHIP.de have more detailed infos and some benchmarks about the Avivo Xcode techonology.

"CHIP Online had the opportunity to test a beta version of ATI’s still secret „Avivo XCode“ encoding tool. It uses the power of the GPU to reduce video encoding time –into virtually any format – drastically. Our results show: The new ATI solution easily does it 5 times faster than even the fastest CPUs available today!!

Look at http://www.chip.de/artikel/c1_artikel_17670022.html

nobody
30-Nov-2005, 14:02
Great Job ATI! The guys from the german online-mag CHIP.de have more detailed infos and some benchmarks about the Avivo Xcode techonology.

"CHIP Online had the opportunity to test a beta version of ATI’s still secret „Avivo XCode“ encoding tool. It uses the power of the GPU to reduce video encoding time –into virtually any format – drastically. Our results show: The new ATI solution easily does it 5 times faster than even the fastest CPUs available today!!

Look at http://www.chip.de/artikel/c1_artikel_17670022.html

From same source:


Remarkable: During the encoding process, XCode uses the main CPU to its maximum – the task manager continually shows 100 percent usage. In contrast, the Windows Media Encoder (who’s supposed to use only the CPU) shows fluctuations from 75 to 100 percent.


That's not what i expect when they talk about moving the encoder processing load from the CPU to the GPU. :(

a3dmaster
30-Nov-2005, 14:24
From same source:



That's not what i expect when they talk about moving the encoder processing load from the CPU to the GPU. :(

That's true, but without Avivo XCode your CPU load is still 100 % and the encoder process is up to 5 times longer.

Bouncing Zabaglione Bros.
30-Nov-2005, 14:49
From same source:

That's not what i expect when they talk about moving the encoder processing load from the CPU to the GPU. :(

I think the point is that when encoding, not only is the GPU speeding things up dramatically, but your CPU is being used to it's maximum to speed encoding as much as possible. They point out that when using WME, your CPU is not even fully utilized, increasing the encode time even more than necessary.

no-X
08-Dec-2005, 13:31
ATi is preparing Catalyst 5.13 driver with decent AVIVO support and many new video features:

http://www.theinquirer.net/?article=28216

wireframe
08-Dec-2005, 14:02
Very interesting results. It would be nice if someone could release an actual clip encoded with Xcode. Also, I am not sure how "beta" this tool is, if it is purely for demonstration purposes or if this resembles the actual tool that will be released (it looks like a soon-to-be-released product), but it seems very limited in what the user can can do. I'm not talking about inputting multiple files - this can be handled externally - but things like setting exact bitrates, multiple pass encoding, triggering certain parameters in the codecs, etc. Perhaps ATI are reserving some of these features for a "Pro" version?

We need some samples for quality comparisons to find out if this tool is for "joe average" or if it produces results that are comparable to the best-of-breed encoders out there. Of course, what we really want is the option to control this ourselves, but if it is this fast all the time there may never be a reason to select "Fastest" mode. :wink:

PS. If this thing really is as spectacular as these articles suggest, I bet Intel and AMD are not very happy about it. That reason to buy a $1,000 CPU just flew out the window and enter the $500 video card that also raises the bar on graphics in your games.

Geo
08-Dec-2005, 14:28
ATi is preparing Catalyst 5.13 driver with decent AVIVO support and many new video features:

http://www.theinquirer.net/?article=28216

Cool! Must be what Terry was hinting at over at Hexus when I asked him.

Geo
08-Dec-2005, 14:33
PS. If this thing really is as spectacular as these articles suggest, I bet Intel and AMD are not very happy about it. That reason to buy a $1,000 CPU just flew out the window and enter the $500 video card that also raises the bar on graphics in your games.

See Nobody's post above near the end. CPU still matters, apparently --tho it will be interesting to see just how much. I hope somebody benches the thing with a variety of them.