ATI's Ring Bus Memory Controller and Vista

rwolf · Feb 22, 2006

I went through the Vista Direct3D 10 presentations and it looks like virtual memory is one of the big features that Microsoft wants to implement.

ATI's Ring Bus Memory controller takes up substantial die space and is highly programmable. Do you think it might be possible that the Ring Bus memory controller supports virtual memory? Can anyone from ATI comment on this?

Dave Baumann · Feb 22, 2006

The early roadmaps that we have all the 16-1-1-1 etc. numbers from states Virtual Memory for R520 (and "NO VM" for RV515).

Demirug · Feb 22, 2006

rwolf said:
I went through the Vista Direct3D 10 presentations and it looks like virtual memory is one of the big features that Microsoft wants to implement.

ATI's Ring Bus Memory controller takes up substantial die space and is highly programmable. Do you think it might be possible that the Ring Bus memory controller supports virtual memory? Can anyone from ATI comment on this?

Virtual Memory is not a D3D10 feature. This is a feature from the new Vista/Longhorn driver model. Every card that can transfer blocks of memory between the system and the video RAM in both directions is able to support it.

OpenGL guy · Feb 22, 2006

Demirug said:
Virtual Memory is not a D3D10 feature. This is a feature from the new Vista/Longhorn driver model. Every card that can transfer blocks of memory between the system and the video RAM in both directions is able to support it.

It's a lot more complex than that! If it were that simple, every card that supported system-to-video and video-to-system blts would be able to support virtual memory, and that's not the case. Virtual memory on GPUs should be just like virtual memory on CPUs, which means that the chip must use page tables to decode addresses for memory access. In some ways PTEs are similar to GART entries, however, with PTEs, the OS manages them and takes care of swapping data between local, system and paged (i.e. swapped) memory as needed just like it does for system memory.

fellix · Feb 22, 2006

Doesn't P10 from 3DLabs already got a form of virtual addressing for the texture data, or it was just a multi-level caching ?!

DemoCoder · Feb 22, 2006

Wouldn't the logic for GPU VM also be different? Access patterns, page size granularity, etc are all different compared to OS level VM directed at apps.

Would you page in part of a texture? How is your page-out strategy based on the developer's order of rendering, and what happened previous frames? I remember John Carmack once complained of an situation where LRU was breaking down and MRU would have been better.

Imagine you have 300mb of textures, and 256mb of VRAM, and every frame you touch all of your textures. However, some textures you touch many times, others are used once and never again. But you could easily get in a situation where a bunch of "one use" textures are used in a row, and some of frequently used ones get evicted because they are "least recent"

OpenGL guy · Feb 22, 2006

DemoCoder said:
Wouldn't the logic for GPU VM also be different? Access patterns, page size granularity, etc are all different compared to OS level VM directed at apps.

Would you page in part of a texture? How is your page-out strategy based on the developer's order of rendering, and what happened previous frames? I remember John Carmack once complained of an situation where LRU was breaking down and MRU would have been better.

Imagine you have 300mb of textures, and 256mb of VRAM, and every frame you touch all of your textures. However, some textures you touch many times, others are used once and never again. But you could easily get in a situation where a bunch of "one use" textures are used in a row, and some of frequently used ones get evicted because they are "least recent"

Prioritization is up to the OS, not the graphics card, same as system memory usage is not dictated by devices.

DemoCoder · Feb 22, 2006

Yes, I realize that, but what about page size and memory layout/access patterns? These don't seem typical compared to ordinary virtual memory patterns, which get their primary benefit from the fact that most application code and data is relatively coherent/local. GPU's streaming nature tends to mean that if developers map X megabytes of stuff, they will touch all of it.

So how does this work anyway? The GPU accesses memory mapped to a texture not resident, generates a fault, driver picks up the signal, pages out if necessary some texture, transfers the new texture, updates the GPU mapping table, etc Seems like that will add several hundred cycles of latency to an already slow operation. Couldn't the GPU just go ahead and start the transfer itself without relying on the OS to do it?

OpenGL guy · Feb 22, 2006

DemoCoder said:
Yes, I realize that, but what about page size and memory layout/access patterns? These don't seem typical compared to ordinary virtual memory patterns, which get their primary benefit from the fact that most application code and data is relatively coherent/local. GPU's streaming nature tends to mean that if developers map X megabytes of stuff, they will touch all of it.

Plenty of application allocate resources that are seldom, if ever, used.

So how does this work anyway? The GPU accesses memory mapped to a texture not resident, generates a fault, driver picks up the signal, pages out if necessary some texture, transfers the new texture, updates the GPU mapping table, etc Seems like that will add several hundred cycles of latency to an already slow operation. Couldn't the GPU just go ahead and start the transfer itself without relying on the OS to do it?

How can the GPU do this? What memory will it use for the data transfer? If all memory is committed, something has to get paged out as well which is why the OS has to handle all of this.

arjan de lumens · Feb 22, 2006

There are several possible capabilities of a GPU that get mixed together in this discussion:

Can the GPU execution units access off-board (AGP/PCIE) memory areas as if they were on-board?
Does the GPU support a page table to translate memory addresses?
Is the GPU itself capable of modifying the page table and do associated data moves with/without OS intervention (using system memory as a "swapfile")?

rwolf · Feb 22, 2006

OpenGL guy said:
It's a lot more complex than that! If it were that simple, every card that supported system-to-video and video-to-system blts would be able to support virtual memory, and that's not the case. Virtual memory on GPUs should be just like virtual memory on CPUs, which means that the chip must use page tables to decode addresses for memory access. In some ways PTEs are similar to GART entries, however, with PTEs, the OS manages them and takes care of swapping data between local, system and paged (i.e. swapped) memory as needed just like it does for system memory.

For the X1000 series is this a subject you can't comment on or can you spill the beans and tell us what support these GPUs have for VM and how they work.

rwolf · Feb 22, 2006

Dave Baumann said:
The early roadmaps that we have all the 16-1-1-1 etc. numbers from states Virtual Memory for R520 (and "NO VM" for RV515).

I am surprised that ATI doesn't actively promote this feature. I suppose that they don't want another F-Buffer like feature that never gets used.

DemoCoder · Feb 22, 2006

OpenGL guy said:
Plenty of application allocate resources that are seldom, if ever, used.

Ok, I'll make this real simple. What's the minimum page size supported and show does this relate to GPU application usage patterns.

How can the GPU do this? What memory will it use for the data transfer? If all memory is committed, something has to get paged out as well which is why the OS has to handle all of this.

Well, I presume that mapped memory is read only, so page out amounts to updating your mapping tables and possibly dealing with "in-flight" processing which is referencing this data. I see no reason why you can't have a programmable "VM controller" just like you have a programmable "memory controller". In a typical OS implementation, the requestor is the CPU and handles all the logic, and the secondary storage device is an I/O device (typically disk).

In the GPU scenario, the GPU is the requestor, the CPU handles all the logic, and the secondary storage device is the main memory of the CPU. This would be like having an OS VM system where your "disk drive" ran the VM logic. I'm merely positing a more intelligent GPU VM controller that can avoid the round trip to the CPU, and treats the CPU as a mere I/O device.

To what gain? Reduction of round trip latency.

rwolf · Feb 23, 2006

The OS must handle the virtual memory swapping because the GPU could swap out to main memory or disk. In addtion the OS/application can swap in segments in advance before the memory is actually needed.

Jawed · Feb 23, 2006

arjan de lumens said:
Can the GPU execution units access off-board (AGP/PCIE) memory areas as if they were on-board?

I think that's a primary intent: that the application(s) don't have any idea whether a memory location is on-board or off-board. The application never has to "think" about this. It just happens under its nose.

This is because multiple applications will all be running simultaneously on the GPU. Each of which needs a protected memory space, and none of which wants to care about the proportion of on-board RAM it's currently got.

Jawed

arjan de lumens · Feb 23, 2006

Jawed said:
I think that's a primary intent: that the application(s) don't have any idea whether a memory location is on-board or off-board. The application never has to "think" about this. It just happens under its nose.

This is because multiple applications will all be running simultaneously on the GPU. Each of which needs a protected memory space, and none of which wants to care about the proportion of on-board RAM it's currently got.

Jawed

At the user application level, "automatic" memory management has been available through both OpenGL and Direct3d for ages -in fact, OpenGL has never provided any non-automatic memory management mechanism.

Jawed · Feb 23, 2006

I'm suffering from brain-fade tonight

- I think I'll go watch that 911 conspiracy theory vid on Google.

Jawed

OpenGL guy · Feb 23, 2006

DemoCoder said:
Ok, I'll make this real simple. What's the minimum page size supported and show does this relate to GPU application usage patterns.

I believe the page size is the same for GPU as CPUs, for obvious compatibility issues. If the page size between CPU and GPU differed, then you'd have issues updating a single entry on one or the other.

Well, I presume that mapped memory is read only, so page out amounts to updating your mapping tables and possibly dealing with "in-flight" processing which is referencing this data. I see no reason why you can't have a programmable "VM controller" just like you have a programmable "memory controller". In a typical OS implementation, the requestor is the CPU and handles all the logic, and the secondary storage device is an I/O device (typically disk).

Page faults are handled by an OS routine, aren't they? I'm pretty sure there's a fault handler set up for this.

I just looked at the Linux source tree and I see all sorts of stuff for handling VM so it looks like it's not automated at all.

In the GPU scenario, the GPU is the requestor, the CPU handles all the logic, and the secondary storage device is the main memory of the CPU. This would be like having an OS VM system where your "disk drive" ran the VM logic. I'm merely positing a more intelligent GPU VM controller that can avoid the round trip to the CPU, and treats the CPU as a mere I/O device.

To what gain? Reduction of round trip latency.

How could this work with resources in system memory? The CPU manages its own PTEs (with OS support) so the OS would have to update the GPU's PTEs if the CPU's table was updated. Ideally, the two tables would be updated simultaneously to prevent race conditions.

Note that none of my comments are based on what Vista actually does, as I have little knowledge there; I am just making these comments based on my knowledge of GPUs and other OSes I have worked on (like Linux).

DemoCoder · Feb 23, 2006

rwolf said:
The OS must handle the virtual memory swapping because the GPU could swap out to main memory or disk. In addtion the OS/application can swap in segments in advance before the memory is actually needed.

There is no apriori reason why mapped memory must be writable. The principle problem with memory on GPUs to date is not using too much framebuffer, but using too many art assets.

You don't "swap them out" to main memory. There is a copy in main memory already.

OpenGL guy · Feb 23, 2006

DemoCoder said:
There is no apriori reason why mapped memory must be writable. The principle problem with memory on GPUs to date is not using too much framebuffer, but using too many art assets.

You don't "swap them out" to main memory. There is a copy in main memory already.

Having two copies of every managed texture is not very efficient, and even impractical in some cases.

ATI's Ring Bus Memory Controller and Vista

rwolf

Rock Star

Dave Baumann

Gamerscore Wh...

Demirug

OpenGL guy

fellix

DemoCoder

OpenGL guy

DemoCoder

OpenGL guy

arjan de lumens

rwolf

Rock Star

rwolf

Rock Star

DemoCoder

rwolf

Rock Star

Jawed

arjan de lumens

Jawed

OpenGL guy

DemoCoder

OpenGL guy

Similar threads