I can think of a few ways that would make such professional "mining" boards more attractive in comparison to consumer cards:
1) On-die multichip interconnect
Instead of multi-chip or multi-card solutions, place several lower-cost GPUs and shared HBM memory in the same chip package.
This would make professional solutions scale in a cost-effective and power-efficient way.
I believe this is rumored for Navi 10.
Are there specific crypto-coins that are limited by the main chip's processing or internal transfers?
The rigs I've seen go out of their way to put as many discrete cards on a board, hooked up to as few PCIe lanes as possible. The GPU is usually clocked low and undervolted if possible, with the memory pushed up as high is feasible.
I see a possible use case for allowing a way to maybe chain multiple chips from a single PCIe slot without having a bridge chip, although the actual speeds needed to get a DAG into a local DRAM pool don't appear to that limited if miners are content to plug many cards into risers or PCIe cables dangling from as many motherboard PCIe x2 slots as are available.
Maybe create a minimal mobile/external GPU package using external PCIe connectivity. If on a dedicated board, perhaps give them the ability to daisy-chain from a single PCIe x16 slot.
2) Binning for low power
Professional chips can be binned for the lowest possible power requirements, increasing their cost-efficiency ratio - like the R9 Fury Nano which offered the same 8 GFLOP performance at 1.5x lower power requirements of the regular R9 Fury.
This should also improve failure rates, which is critical for 24/7 operation in datacenters and mining farms.
I suppose this depends on the state of the mining market, and may be too late at this point. Right now, the market's so overheated that making a profit in such a supply-limited scenario could allow for any chip that can reach the underclocked speeds miners set them to be salable. And if the card can make back its purchase price before it breaks, the reliability standard seems like it could be more relaxed.
3) Optimised software
Compute-optimised drivers, additional shader languages targeting SPIR-V bytecode, standard processing library (like CUDA from Nvidia), early preview of Vulkan-based OpenCL, etc.
This would also depend on whether the market remains overheated, but so long as a card is profitable, does it matter as much? The larger outfits can invest in optimized code, and perhaps just one programming avenue for it could be enough--or none if a card is at least profitable and the situation is supply-limited.
Points 2 and 3, if the mining market remains as speculatively driven and supply-limited it is (risky assumption at this point long-term) point to another way to "cater" to them. They're paying ridiculous mark-ups, so the point isn't to make one's product more effective for their dollar, they're obviously being under-charged.
Create instructions that can noticeably help mining performance or efficiency, then throttle them.
Determine what patterns are needed for mining, then determine what lowered voltage+gating+clock levels those need, and then make them unavailable on standard firmware.
Charge extra for a card with the limits lifted, or charge for a driver version+firmware update that makes them available.
Vega's PSP already serves as a barrier AMD can use to restrict things.
4) Fast processor interconnect
HyperTransport, PCIe 32x slots with PCIe 4.0/5.0 protocol, etc. to make NUMA nodes from each card.
This is similar to point 1, is there a mining target that is constrained by that element of the system?
Coins like Ethereum are purposefully bottlenecked by local DRAM bandwidth, in part to make them ASIC-resistant and to prevent what is believed to be an unfair advantage from large SMP setups or clusters, which would use a high-speed system interconnect to get massive numbers of chips and memory to scale in performance.
5) Factory bundles
4-8 cards in one boxed package, intended for small mining farms.
Charge extra for bundles of cards that can have their mining instructions and DVFS levels unlocked with a shared enablement key.
The fact that current professional GPU solutions like $2100 Radeon WX 9100 and $7000 Radeon Pro SSG do not really offer any additional value for mining/HPC applications in comparison to high-end 'consumer' cards like Vega64, is obviously an indication that we are going through the paradigm shift which GPU companies did not anticipate.
For HPC, the SSG is a step in the direction AMD expects is necessary long-term. The capacity, cost, and static/refresh power consumption of DRAM are expected to scale too poorly, and inter-node communication power costs more if local storage spills. Some kind of non-volatile pool near the GPU is something AMD proposes to compensate, but there are more complex trade-offs based on workload and access patterns.
Nvidia faces this problem as well - look at their
EULA affair with Sakura where they try to artificially restrict the use of consumer cards in professional environment.
Not for mining, that was specifically carved out from the datacenter limitation.