Yes, there's a lot more to a modern SoC subsystem than just making requests on ports of a certain width. There's internal buffering, transaction priorities and outstanding transaction queues, burst behaviour tuning, aggressive last level caching, etc. The major clients of the memory controller in a modern SoC all behave differently in their requests, too. Some are read only, some are write heavy, some are bursty, some are heavy streamers, some need as low latency as possible to satisfy internal block "QoS". The GPU is a strange one in particular because it puts different loads on the memory subsystem depending where in your render you are.
Tuning a memory controller and connected ports is therefore a balancing act, and one that makes "peak bandwidth" incredibly difficult to provide to any one requester, especially in a modern consumer device where there's always non-negligible bandwidth needed to serve the display at least.
Tuning a memory controller and connected ports is therefore a balancing act, and one that makes "peak bandwidth" incredibly difficult to provide to any one requester, especially in a modern consumer device where there's always non-negligible bandwidth needed to serve the display at least.