In this paper we propose the MCM-GPU as a collection of GPMs that share resources and are presented to software and programmers as a single monolithic GPU. Pooled hardware resources, and shared I/O are concentrated in a shared on-package module (the SYS + I/O module shown in Figure 1). The goal for this MCM-GPU is to provide the same performance characteristics as a single (unmanu- facturable) monolithic die. By doing so, the operating system and programmers are isolated from the fact that a single logical GPU may now be several GPMs working in conjunction. There are two key advantages to this organization. First, it enables resource sharing of underutilized structures within a single GPU and eliminates hard- ware replication among GPMs. Second, applications will be able to transparently leverage bigger and more capable GPUs, without any additional programming effort.
Alternatively, on-package GPMs could be organized as multiple fully functional and autonomous GPUs with very high speed in- terconnects. However, we do not propose this approach due to its drawbacks and inefficient use of resources. For example, if imple- mented as multiple GPUs, splitting the off-package I/O bandwidth across GPMs may hurt overall bandwidth utilization. Other com- mon architectural components such as virtual memory management, DMA engines, and hardware context management would also be pri- vate rather than pooled resources. Moreover, operating systems and programmers would have to be aware of potential load imbalance and data partitioning between tasks running on such an MCM-GPU that is organized as multiple independent GPUs in a single package.