Ok. Part 2.
Bigger is better.
There are different approaches to rendering 3D scenes. If you have your whole collection of objects and rules, you need to put them in the right place, transform and render them. That is called scene management. And when you have assembled your scene, you send it all to the graphics chip, together with all the rules, go sit back and watch the show. Right?
Well, that would be the ultimate goal. But it isn't as easy as that. For starters, while DX 9 class graphic chips can do quite a lot of generic processing, they can't do it all and there is still no easy way to apply things like physics rules and start an appropriate sound at the right place and moment. Further, the memory needed to contain all that is limited. More on that later.
If we are going to look at the functions that can be executed by the GPU (Graphics Processor Unit, the chip), we see that there isn't a chip in existence that can do all that is in the specs of the API (Application Programming Interface, the thing the programmers use to make it do what they want it to). So, you have to mix and match functions according to what is available for the current hardware to get the result you want.
There are basically two different ways to handle that. The first one, as used by DirectX, is just to flag what can be done by the hardware and discard the rest. The other one, as used by OpenGL, is to emulate everything the hardware cannot do in software. Both have their pro's and cons: while just about anything will run on OpenGL, it might be VERY slow, if most of it is emulated by the CPU (Central Processing Unit, the processor on your motherboard). With DirectX, it will all run fast, but it might look totally different than you expected it to.
Even so, a lot of work isn't done by the GPU as expected, but by the drivers, who use the CPU to do their work. And some things might run on the GPU, but too slow to be of any practical use.
Which brings us to the central point: size matters.
Now we have a basic platform that can do just about anything you want with the current generation of graphic chips, you want it done fast, so at least 30 fps are shown.
How much memory do we want the GPU to have? As much as possible? Sure! But the memory is only there, next to the GPU, so that it can be accessed faster than the memory on your motherboard. If we could get at that memory fast enough, we wouldn't need any other memory next to the GPU. Speed is everything.
If we create a very nice and huge 3D world, we would like to be able to give it all to the GPU and let it sort out the rendering. But it might be too much to fit into it's memory and take way too long to render a frame. So we need to clip it. That means, that we remove everything that isn't visible. And what's left is send to the GPU and rendered.
To do that, we need a way to store all the locations of all the objects and determine what parts are visible. A scene graph. Which is a pretty hard thing to do, as a random number of 3D objects of all sizes on random locations don't fit neatly into a grid.
All these things require basically two things: very fast communications and pixel rendering. Which translates to very fast RAM and as many pipelines as you can get.
When we talk about fast memory, we talk about two things: the time it takes to get the data from memory (latency), and the total amount of data you can get in a certain time (bandwidth).
The CPU in your computer basically runs a single sequence of instructions, so it wants a very low latency, as the wait for the next data might stall the whole processor. A GPU on the other hand, basically executes a large amount of simple tasks in parallel (the calculation of all the pixels in a frame), so it can start with the next pixel while the first one is waiting for data to arrive. So, it can use all the data that arrives and it will only stall if the whole capacity is used to wait for data to arrive. Therefore, it is dependent on bandwith.
The "size" part in the RAM is therefore not the total size of the RAM, but the amount of data that arrives at the GPU every second. And this goes a long way: a bit of superfast memory offers better possibilities than a very large amount of slow memory. The size of the bandwith is everything.
RAM memory is, compared to a CPU or GPU, terribly slow. And it doesn't help, that you have to push all that data through wires, from the RAM chips to the GPU and back (the bus). Those wires are embedded in the graphic board, as copper traces. And the chips need pins to connect to those traces. So, making a ultra-broad bus would be prohibitely expensive while designing the board and chips, as the chips need very many pins (and become huge) and all those traces have to travel through the board to all the chips, so the board would become very fat and very expensive to make.
To solve that, they use DDR (Double Data Rate) RAM. That is memory, that sends double the amount of data over the bus. And there is even quad-pumped DDR nowadays, which theoretically quadruples the bandwidth.
To counter all the waiting and stalling (latency), caches are used. Those are very fast buffers on the GPU, that store the data for immediate use. So, why not put all the memory on the same chip as the GPU?
The cost of a chip is determined mostly by two things: the size of the die (the rectangle that contains the millions and millions of microscopic transistors) and the amount of chips that are not defective after manufacturing (the yield). If you make very large chips, very many of them will be defective, while only a small amount can be produced at once (they come off a slice of silicium the size of a CD). Which makes the few remaining ones extremely expensive.
So, in one case, bigger isn't better!
Let's recap:
- More functionality (so it runs in hardware) is better
- More general programmability (shaders, for better visuals) is better
- More bandwith (the fastest memory and the widest bus) is better
- More pipelines (so more can be done at the same time) is better
- A bigger chip would be better, if it wasn't so very expensive