If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
#76 | |
|
Junior Member
Join Date: Feb 2010
Posts: 92
|
Quote:
The reason to do it at the page level in particular is that you already have dedicated hardware. The big limitation is that you wouldn't be able to do read only sharing, since I don't believe the TLB has any concept of read-only, and thus can't throw an interrupt when the first memory write is encountered. You could however have a state in your handler emulating read only sharing by simply not requesting an invalidation whenever a page belonging to a read sharable memory space is brought in, though this would have to be respected by the programmer. How efficient transferring a multilevel structure would depend on how well malloc can cluster objects together. If the memory page had a large number of other elements of the structure, it would be much more efficient to transfer the entire page than to do each element separately and update pointers (a BVH is, admittedly, an extreme example, but there are many cases where you have small multilevel structures you want to use on both devices). However, false sharing would be *bad*. What you'd need then is some concept of memory spaces for malloc, so that it wouldn't inadvertently allocate some CPU scratch variable in the middle of a memory page containing GPU data. A memory space would simply be some set of memory allocations that's guaranteed not to alias pages with memory allocations associated with any other memory space. It'd also be important to ensure that any page from a memory space that can be read by multiple devices is small. The ultimate result would be a system that, while not an ideal coherency protocol, could still retain many of the benefits with regards to usability, and would not make anything slower in the traditional usage pattern. The important part is that this could be implemented in software only, so the only thing that would have to be updated would be the OS and the drivers. Last edited by keldor314; 19-Apr-2013 at 08:43. |
|
|
|
|
|
|
#77 | |
|
AndyTX
Join Date: May 2004
Location: British Columbia, Canada
Posts: 1,885
|
Quote:
How much effort to throw at discrete GPUs/memory spaces depends on how much you believe they are going to matter in the future. With all of the consoles going unified and arguably everything laptop level and down as well, it's only going to be the very high end desktop stuff left as discrete. One could make an argument that those systems could take a more brute force path and still be acceptable. I have a hard time accepting that APIs should be designed around their constraints going forward, even though I love my massive discrete GPUs
__________________
The content of this message is my personal opinion only. |
|
|
|
|
|
|
#78 | |
|
Senior Member
|
Quote:
|
|
|
|
|
|
|
#79 | |
|
Junior Member
Join Date: Feb 2010
Posts: 92
|
Quote:
HSA is interesting, but until the CPU socket catches up and closes the order of magnitude wide memory bandwidth gap with the GPU socket, it's useless for high end stuff. |
|
|
|
|
|
|
#80 | |
|
Senior Member
|
Quote:
|
|
|
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|