How much would it cost to extend NEON to 512 bit and add gather support?
Do you have access to the HDL code or just the RTL or just a hard-macro? I don't know, ask ARM, the kings of low-power, but I doubt they are going that route (yet). But I sure would like that see a NEON-512 with all the nifty vector instructions. MIPS, errr, IMG said mips64 has vector-complete vector extensions, I assumed they had been reading your posts with interest this past years.
OT: Some time ago and inspired by your ideas I whipped out the calculator on the calxeda-SOC based products available, but they weren't there yet in FLOPS/Watt, being Cortex-A9 based and just quad-core. Maybe octo-cores and A15/A5x with FMA will be there as a possible solution for render farms. (For example:
http://www.boston.co.uk/solutions/viridis/viridis-2u.aspx but with twice the cores and twice the FLOPS/clock and the same power consumption).
ARM likely licenses those Mali-400s as hard macros available for fabs companies like Allwinner utilizes. Here you'd be on your own with the physical design portion. Even if you use synthesis tools I don't think it's totally trivial.
"(Synthesized in an evening open source GPU die area minus GPU Hard-macro die area) times $/mm2" in $/die is the magic number to compare to the price of licensing an ARM/IMG/... GPU. It comes with AXI interfaces so there's a path to lowest-effort no-fscks-given ARM integration.
I also have serious doubts that these guys can come anywhere close to being competitive in the mobile space, starting with a 1997-era design with only a year and $1m. The GPU could be totally free for SoC makers but if it's a total joke it won't matter.
The unified shaders $1m project would be a new design, the 1997 compatible part is just VGA and bitblt as far as I can see. They could just design a GCN compatible core for giggles. And competitive is relative, to paraphrase Jawed's signature: "Can it play Angry Birds?" (At least Angry Birds version 1.0 used mostly bitblt (glCopySubImage or sth like that, iirc) I once traced it with the adreno profiler on my nexus one).
To be clear, I agree with "total joke" part wrt. to performance, performance/watts, features and software driver support. But I'm not sure on the $/SOC part.
It's kind of moot though, I see no chance of them getting that $1m. In fact, I'd be surprised if they got the $200k and really, really surprised if they got the $400k.
Agreed, their only change is a chinese sugardaddy willing to win the race to the bottom. IIRC Allwinner already uses an OpenRISC CPU at less than 100MHz for system task (power management, iirc), instead of licensing a Cortex-M for pennies. Googled: yep
http://linux-sunxi.org/AR100
Edit: forgot: ETC1 licensing royalties. The patent is still valid, isn't it?