Workloads do not descend from heavens. Application developer creates them.
If the data locality is bad, or code branches like crazy - you wrote a bad code.
Yes it will suck less on CPU than on GPU, but it will still suck.
A workload is the code and data set the hardware is tasked to run on.
Those are informed by the constraints of the problem the programmer is trying to solve.
If cache hit rate is very high it will run equally good on GPU. The only case here is if the cache hit rate is high only on specific cache sizes > GPU cache size. Looks too artificial to me.
It's not artificial when cache sizes are so much larger per thread on the CPU. There are 4 MB of L2 cache for up to 8 CPU threads in Orbis.
Assuming equivalent usage, it's 512 KB of cache to play with.
There is 3/4 MB for up to 720 wavefronts, which is 46,080 threads--if you buy into the marketing.
That's about 1K per wavefront, and 17 bytes per "thread".
How many orders of magnitude are necessary before the example becomes not artificial?
edit:
My apologies, I was mentally using a larger GPU.
Orbis has 512KB, so cut the per-thread cache allocation as necessary.
If you have low arithmetic density: do not use CPU at all. Use calculator.
I don't normally point out garbage argumentation.
And complex control flow = bad code. Or it's infrastructure code = no need for speed.
Graphics drivers/compilers.
GPU compute run-time managers.
Or are you asserting Orbis might not need those?
CPU is good for bad/old/legacy code. I know that.
They are also good at code that requires fine-grained synchronization, and there are simply problems that include that.
There are data sets that fall below the minimum the GPU needs for utilization. This is still the case for GCN.
Reduction operations are common, and it follows that if the GPU does that enough times, eventually the data it works on falls below the minimum.
See how AMD is trying to sell HSA for image recognition. The GPU is faster for the initial broad sweeps, but it falls on its face as the number of tiles drops.
Bottom line:
CPU and GPU try to leverage the same problem: how to keep caches full in such a fashion that memory bandwidth is saturated all the time.
That is not what the CPUs try to do, they can't fully schedule around a miss that goes off chip.
Under much of their operating range, CPUs do their best to prevent off-die access.
GPUs start from a pessimistic case where they assume off-die access is extremely routine.
PS4 GPU has been modified for fine grained computing.
Fine-grained relative to what?
For previous-gen GPUs, sure.