One of the reasons you don't see a lot of documentation is that different memory management schemes are used by different driver stacks, and even within a single driver stack you get different memory management schemes depending on the OS and the hardware generation.
Some are pointer based (buffers are allocated at a fixed physical address and stay there), some are handle based (buffers are relocatable normally but are pinned to a fixed physical location just before use and unpinned after use so they can be moved again), while others use virtual addresses in the driver stack and use the GPU page tables to map virtual addresses to the corresponding physical pages.
Another thing to consider is that "GPU memory management" tends to include not only management of dedicated graphics memory attached to the GPU but also the system memory buffers used for graphics operations, including textures, command and vertex buffers, and swapped-out graphics memory buffers.
Using the radeon open source stack as an example, the overall API is based on GEM (Graphical Execution Manager) but the driver-specific memory management is based on TTM (Translation Table Maps). You should be able to find info about both of those but I don't think any of it is current.
Some basics to get you started :
- GPU memory management in the open source driver covers both video memory and system memory used by the graphics driver, including GART area
- memory manager is implemented in the kernel driver
- driver-requested buffers are relocatable (handles rather than pointers) and locked down when in use
- kernel driver copies & patches command buffers into one of a separate pool of buffers then queues to GPU
- other buffers can be moved between system and video memory depending on available memory
If you want to get a not-too-painful introduction, suggest you read up on GART first. GART itself was introduced as part of AGP, but the basic idea (a non-contiguous chunk of system memory pages made to appear contiguous by a dedicated page table and addressible by both CPU and GPU) has carried on even though AGP has been replaced by newer buses.
Hmm. It's actually hard to even find good docco on GART. The best online docco I could find on short notice was an x.org page on TTM :
http://www.x.org/wiki/ttm