Unfortunately that is my point on complexity as they cannot work independently, you still need coherent/control-management mechanisms in place if one is using this as an HPC scaled up/out implementation with said HPC applications; hence CCIX/CAPI/Gen-Z/Mellanox in general should be added to that list and their solutions make up over 50% of HPC implementations.Complexity depends on how they are using it. For certain HPC tasks, large read-only datasets, most of that overhead would be nonexistent. That should be the case for the oil and gas guys and the implementation somewhat proprietary. Large scale rendering or raytracing could be similar. Multiple GPUs each working a subset of the screen space and HBCC caching pages that get hit. As each GPU would be completely independent the control flow issues go away. CCIX and CAPI are interesting for a certain segment of problems, but shouldn't be necessarily for a SSG type problem with a SAN. Once the GPUs have to start synchronizing it can get more difficult, but automated paging makes that far easier. No different than CPU programming where data pages automatically. That would be familiar to many researches with limited programming ability. It becomes a question of efficiency and any gaps are filled with other work thanks to async compute if practical. So long as all the jobs don't generate stalls simultaneously the chips should stay near peak performance. If that is occurring the implementation will be problematic on any hardware.
Bear in mind my context is purely HPC coming back to Vega20.
If you go back to HotChips AMD Radeon Next Generation GPU Architecture 2017 they mention in footnotes regarding HBCC in context you raise (with storage high level rather than specifically multi-accelerator/node implementation):
Inclusive Cache model is what requires working with storage, but like I said for HPC it would need to be highly complex with overheads and higher level support; look at solutions involving those technologies I mentioned earlier.This feature (Inclusive Cache Model) is still in development and may be better utilized in future releases of Radeon Software, SDKs available via GPUOpen, or updates from the owners of 3D graphics APIs.
Edit:
I should be clearer by saying Inclusive Cache model is what is required when the product is not the Pro SSG as a workstation solution.
Last edited: