There is a valid reason about not going for the "workgroup count" route and that is marketing. Look at Nvidia and their "CUDA cores" marketing, with considering the FP32-capable ALUs as a separate "core" while in reality the logic is shared at SM-level. Because it is bad to market "SM" where you have the 3080 and the 2080Ti at the same count when you can market "8704" vs "4352" CUDA Cores. Yes, peak FP32 capability is double per SM (but it is also true you cannot count on that peak rate in every workload. ) marketing loves numbers, the bigger the number is, the better. Now, if AMD would go for a "workgroup" count they would call for a 33% regression for each die, even if the FP resources would be 50% more, per die. Would you market the card at "30 Workgroup per die" vs "40 Workgroup per die" or "7860 CU per die vs "5120 CU per die"? For people understanding tech terms, it would be the same, for most of customers 30 is less than 40.
Intel isn’t playing that game and are counting EUs not SIMD lanes. It’s debatable though whether the number of EU/CU/SM is more accurate or more helpful than the number of SIMD lanes when it comes to graphics and compute performance.
This is especially true when the width of each of these units is vastly different. An EU is 8-wide and a CU is 64-wide. It doesn’t make sense to compare them directly. Also an EU isn’t functionally equivalent to an SM or CU. It’s closer to just one of the SIMDs (or partitions in Nvidia’s case) within those units.