That's not the only issue with graphics APIs, another one is strict ordering requirements, objects must be processed in order they were submitted, which would limit parallelism significantly without synchronizing the work between chiplets.
There is little doubt, front end must be reworked to account for chiplets. I guess the new front end must be very smart to distribute the work across multiple chiplets in an efficient manner -- it should batch more draw calls and track more state than ever to keep utilization high.
I’m thinking it’s the exact same challenge in monolithic dies. The only difference is lower latency communication because it’s on die. Question is whether the increased latency and lower bandwidth between chiplets is enough to require a new approach to distributing work.