The context just holds memory allocations. It's not going to gain any speed from HSA.
Well let's say you want to GREP the results of your sort (which I think makes much more sense to do on a CPU) to find patterns or present it to an Excel range for a client; now you have to pass that whole context back into the CPU's memory space right? I'd be thrilled if this were a fast and abstractable process on Kaveri.