"Where do the frames go"...what am I missing?

http://www.tech-report.com/etc/2002q3/agp-download/index.x?pg=1

...but a few of my correspondents pointed out a very practical problem with rendering high-quality graphics in real time (or nearly so) on a graphics chip: getting those rendered frames back from the video card and into main memory or stored on disk.

Well, I'm confused.

The article appears to talk about the shortcomings of "VPU"s used for "professional applications" and "cinematic rendering" compared to CPUs. Where the problem outlined is that transferring the rendered image from the framebuffer to "permanent" storage is "slow."

Well, sure, it's slow (8 FPS for 720x480x32)...but isn't still much faster than rendering the same image on a CPU which is usually seconds or minutes (or hours) per frame?

While today's graphics cards can render images very quickly, the software drivers are painfully slow at getting rendered output back to the PC where it could be saved and put to work by users

Well, since when does "rendered output that needs to be saved where it could be put to work by the users" need to be real-time? I don't really understand the problem...maybe someone else can elaborate?

The article descruiption of the "Serious Magic" benchmark outlines a few "benefits" of real-time frame-buffer to permanent storage...but I don't see those benefitting "3D Professionals". Just a few gimmicky things (recording video games?) that could be worked around very easily using a card with Video-out circuitry....

In short, I don't see this "transfer" issue as some real shortcoming for "cinematic rendering" or "production quality" graphics that was the focus of the original article. This is just something that "gee, that would be cool if it could be done".

I also don't think it's "just a driver" issue, as we are being lead to believe. Just because the "specs" of the AGP interface say one thing, that doesn't mean that in reality, those transfer rates are possible in a streaming , constant fashion....
 
Hmm interesting.
They seem to be unaware that the video card itsself will be using a lot of bandwidth, that is why faster transfer mechanisms are devleoped, because graphics cards can use almost whatever bus bandwidth you can give it.

As well as the fact that the AGP bus may already be in use transfering textures etc, the card itself is actually buys rendering the images.

Bandwidths quoted will generally be peak bandwidths, and will not include the latency of the request, and any set up time for the transfer.

AGP is not , and never was intended for high speed downloads, it's main purpose was to get data into a graphics card as fast as possible, not the other way round.

This article seems to be making some rather grand assumptions on what is possible, and who is at fault.

Interesting article, and the purpose is valid, i.e. to use and store realtime 3d graphics today system is inadequate. But the research done is a little uninformed.

(p.s. This is all my opinion and beleifs , and may not in actual fact be accurate).

CC
 
Hmmm, interesting dilemma... If you want to use the graphic chip to do the rendering (in e.g. 3ds max) for stills or a movie you will need to write from the framebuffer back to the harddisc instead of to the screen. So the AGP would need to transfer data both ways and could thus slow things down - or at the least be the bottleneck more than the shaders themselves.
 
I was under the impression that AGP was essentially limited to PCI speeds on writeback.

But I agree, you'll most likely be doing more work on the rendering that it will dwarf the amount of time consumed in the copying of the frames.
 
What technical aspect of AGP is it that makes downloads unbelievably slow then? AGP is after all just an extension of PCI, and that protocol certainly has no problems with transfers in either direction.

*G*
 
Video memory was never suitable for reading by the CPU.
This goes back to the days when it wasn't suitable for writing eighter :)

In the old days most of the bandwidth was occupied by refresh (ie. the RAMDAC). Since the RAMDAC read the memory in long bursts, the latency for the CPU was very large, and since nothing could hide that latency it caused very low effective bandwith figures.

Around the time "Windows Accelerated" videocards was introduced, they implemented a write FIFO in the graphic chip hiding the latency from the CPU and causing 10x or more increase of effective bandwidth.

The same accelerators provided hardware accelerated blitting so there was no more reason for the CPU to read the video memory.

So nothing was done to accelerate reading since than - there was simply no demand.

That thas not mean it is not possible!
Assuming continuous reads, speculative read ahead could do the same wonders that was done to writing. And thats without the need to rewrite any software.
 
I read this article yesterday. It's quite perplexing. I'm not sure why one would want things to be done in that nature. It sounds akin to hitting "capture window/print screen" and then doing a File->Save.

I'm not sure the conclusions are proper either. He seems to be blaming drivers, when it could be a problem with the API or with the lack of understanding how memory accesses are optimized. That's unclear. If I were to approach that problem, I'd require bus-mastering out of vram and agp. Even then, there would be stalls for synchronization, without taking into account everything else that is being done while this "capture" is being performed. The bandwidth consumed by the card for it's operations and continued rendering, for example.

Overall, I think the complaints are poorly written, with moderate amounts of finger pointing, which are probably not justified. I've not analyzed this "benchmark" and I'm not sure I want to waste the time in determining how they are downloading render targets into PC memory. If it is not page locked memory, it would definitely sound like a DX lock/unlock or a glReadPixels is being performed. That's speculation, but still...*yawn*. I think the problems lie in the lack of understanding that the graphics data pump generally goes one way...into vram (even the APIs for multimedia are optimized in that fashion), and trying to develop a software only solution to capture images is, well, just going to be pathetically slow. Accessing "moving" images requires synchronization in one form or another. For software, that's idling the pipe.
 
Maverick said:
Any particular reason that they couldn't just capture the image data from the cards DVI port?

I'll just venture a guess here and say that the DVI output can't support the 64-bit color depth that most cinematic rendrers are stored as.

If my guess is correct, then the TMDS is reducing the high precision color frame down to a format suitable for display (i.e., 32 bit or 40 bit).

Is this anything like correct? :oops:
 
I think a few different issues are getting confused here. The company that prompted the article, Serious Magic, develops video editing software. They want to use the AGP bus to stream processed video frames out of the graphics chip to system memory or the HD. The author of the article then used their benchmark to draw his own conclusions about how this would impact cinematic rendering.

When you're talking about streaming video, you're probably not dealing with high resolution images or color depths > 32bpp. So you could just capture the data from the DVI port as Maverick suggested.

When you're talking about cinematic rendering, you're dealing with very high resolutions and color depths, but you're not really worried about frame rate. In this case, the DVI port wouldn't be useful, but even a 12 MB/sec texture download speed over the AGP bus would let you write data out of the graphics card at a few frames per second, which would be entirely sufficient for generating final production-quality frames.
 
There is nothing inherent in AGP that would cause GPU->system memory transfers to be so terribly slow - in fact, the AGP standard specifies a bus mastering mechanism for transferring data out of the graphics card at the same bandwidth as the fastest data transfers into the card (=1 GByte/sec for AGP 4X). Support for this mechanism is however optional on the GPU side - and it appears that Nvidia/ATI/Matrox at least haven't implemented it in their hardware, presumably due to lack of demand.

The measured figures of ~8-13 MBytes/sec sustained readout speeds are extremely low (about 1/20 of the bandwidth that even weakly optimized PCI reads should have been able to reach) - so low that any application that tries to read out frames that way will be dominated by the time that the readout takes rather than the time needed to draw the frame. And it looks to me that the low speed is for the most part a GPU hardware issue that cannot be fixed by better drivers.

And as far as DVI is concerned: IIRC, the current specification of DVI is limited to 8 bits per color channel per pixel, with no alpha channel.
 
Ok... this is very strange...

I have a GeForce2, and I get ~100-110Mb/s

What the? I had a friend with a GF2 try it, and he only gets 9. I'm terribly confused :eek:
 
What I don't get about this whole thing is, we don't even have any video cards out today that are really designed for high-end rendering. What we apparently have are video chips that could possibly be placed in video cards for use in high-end rendering.

Why would these high-end rendering cards need to have the same limitation? One example of a possibility would be to just display the frames through DVI-output that is read by a second machine and stored (since it's all digital, there should be no loss, I believe...). If such a thing indeed speeds it up by a factor of a hundred, why not?
 
The article seemed to come from Serious Magic's questioning. They're trying to use the video card for something it wasn't designed for. However if they only worry about real-time previews they might be able to make it work.

Consider Matrox's RT2000 editing board. It's been replaced with newer parts, but it serves as a good example. The setup consisted of two cards. The Flex 3D version of a G400 and a PCI card. The G400 was used to do real-time 3D effects like Serious Magic desires. But, it only outputed to the screen for previews. The final rendering was done in software. I have a feeling if the hardware allowed it Matrox would have copied the data back to the hard drive so they could accellerate final rendering. The fact that they didn't makes me think it is a hardware issue.

On development that might change all this is PCI Express. AGP is bi-directional which means there is a penalty for trying to simultaneouly upload and download. PCI Express will have independent upload and download channels, each of which have more bandwidth than even AGP8x.
 
DemonYoshi said:
Ok... this is very strange...

I have a GeForce2, and I get ~100-110Mb/s

What the? I had a friend with a GF2 try it, and he only gets 9. I'm terribly confused :eek:

What sort of chipset/CPU do You use? What chipset/CPU does Your friend use? Maybe this can explain the difference.
 
Joe,
If this is the same article that was mentioned on Slashdot then it all seems like a storm in a teacup.

It seemed to be complaining that, although the graphics card can produce image data at T Mb/s (say >300Mb/s), they can't copy it at that rate back in to memory.

So what? If they could read it at that rate, what the hell would they do with it? It won't take very long to completely fill host memory, you couldn't hope to do much CPU processing on it in that time, and you couldn't really hope to write it (sustained) to permanent storage at the same rate.
 
Simon F said:
Joe,
If this is the same article that was mentioned on Slashdot then it all seems like a storm in a teacup.

It seemed to be complaining that, although the graphics card can produce image data at T Mb/s (say >300Mb/s), they can't copy it at that rate back in to memory.

So what? If they could read it at that rate, what the hell would they do with it? It won't take very long to completely fill host memory, you couldn't hope to do much CPU processing on it in that time, and you couldn't really hope to write it (sustained) to permanent storage at the same rate.

unquestionably Simon's remark about the inabilty of present storage devices to sustain >300MB/s is valid. still, it does make sense for a real-time production system to try and get a sufficient feedback from any one of its stream producers for achieving the final output's targeted framerate. what i'm mean is something along the lines of the following config: a video-capture card, a 3d fx priducer (i.e. a 3d card), and a final output VGA-like device (i.e. a mem-mapped write-only framebuffer for the final output); you want to combine the output from the capture card with what 3d animation the 3d card produced (say as an overlay) and place the combined frame in the output device framebuffer. and you're targetting N fps. now, the caputre card can deliver said N fps, and the 3d fx card can do its stuff at N fps. but whereas the capture card' output is conveniently placed in main mem, 3d card's output needs to be deliberately fetched back. hence the need to do it quickly.

i'm not in the video production business (read: i'm a complete layman wrt that), so does the above make sense?
 
darkblu said:
unquestionably Simon's remark about the inabilty of present storage devices to sustain >300MB/s is valid. still, it does make sense for a real-time production system to try and get a sufficient feedback from any one of its stream producers for achieving the final output's targeted framerate. what i'm mean is something along the lines of the following config: a video-capture card, a 3d fx priducer (i.e. a 3d card), and a final output VGA-like device (i.e. a mem-mapped write-only framebuffer for the final output); you want to combine the output from the capture card with what 3d animation the 3d card produced (say as an overlay) and place the combined frame in the output device framebuffer. and you're targetting N fps. now, the caputre card can deliver said N fps, and the 3d fx card can do its stuff at N fps. but whereas the capture card' output is conveniently placed in main mem, 3d card's output needs to be deliberately fetched back. hence the need to do it quickly.

i'm not in the video production business (read: i'm a complete layman wrt that), so does the above make sense?

There's no need for the 3d animation rendering to be copied to main memory when the 3d card can do the compositing. Instead stream the captured video to the 3d card and let the texture engine do the compositing for you. If the final output is to the frame buffer you're done. If you want the final frame to be copied back to memory, that's where the hard drive limitations come into play.
 
mboeller said:
What sort of chipset/CPU do You use? What chipset/CPU does Your friend use? Maybe this can explain the difference.

He has a P3 800, I have a P3 933. Another friend also with a P3 933, and a GF4 4400 got 13.

I think all of us have VIA chipsets... but I'm not positive about them.
 
Back
Top