It is possible for work on a particular APD to be constrained by the speed of the assigned portion of the interconnect fabric (such as the PCIe connection assigned to that APD). More specifically, it is possible for work on the APD to be processed more quickly than work can be transferred over the PCIe connection to the APD. The techniques herein increase the effective bandwidth for transfer of data from the CPU to the APD and/or from the APD to the CPU through cooperation of one or more other APDs in a multi-APD system. For a write to a "target" APD, the technique involves transmitting data both directly to the target APD as well as indirectly to the target APD through one or more other APDs (designated as "helper" APDs). The one or more helper APDs then transmit the data to the target APD through a high speed inter-APD interconnect. Although data transferred "indirectly" through the helper APD may take more time to be transferred to the target APD than a direct transfer, the total effective bandwidth to the target APD is increased due to the high-speed inter-APD interconnect. For a read operation from a "source" APD, the technique is similar, but reversed. More specifically, the technique involves transmitting data from the source APD to the processor, both directly, as well as indirectly through one or more helper APDs. A "source" APD--that is, an APD involved in a read operation from the processor 102--may also be referred to sometimes herein as a "target" APD.
FIG. 1 is a block diagram of an example device 100 in which one or more features of the disclosure can be implemented. The device 100 could be one of, but is not limited to, for example, a computer, a gaming device, a handheld device, a set-top box, a television, a mobile phone, a tablet computer, or other computing device. The device 100 includes a processor 102, a memory 104, a storage 106, one or more input devices 108, and one or more output devices 110. The device 100 also includes one or more input drivers 112 and one or more output drivers 114. Any of the input drivers 112 are embodied as hardware, a combination of hardware and software, or software, and serve the purpose of controlling input devices 112 (e.g., controlling operation, receiving inputs from, and providing data to input drivers 112). Similarly, any of the output drivers 114 are embodied as hardware, a combination of hardware and software, or software, and serve the purpose of controlling output devices 114 (e.g., controlling operation, receiving inputs from, and providing data to output drivers 114). It is understood that the device 100 illustrated and described is an example and can include additional components not shown in FIG. 1, or may omit one or more components illustrated in FIG. 1.