I don't think the memory portion of the power is the critical element, more the PHY, which is "on core" either way.
The idea that each pixel per second requires about 2KB of data, source + intermediate + pixel, is just absurd (3840x2160 x 60 fps x 2000 = ~1,000,000,000,000) .
I wonder if a "split" lineup for at least the first generation of 14/16 nm AMD GPUs makes sense, one like the following:This very much depends on the economics of 14nm vs 28nm, and the expected size of the laptop market that would like Pitcairn-level perf at much lower power use. If that market is small enough (just Apple?) then satisfying it with that 150mm² die severely underclocked like the Fury Nano might be a good idea. If it's bigger, reducing the manufacturing cost might be worth it.
In any case, the design + mask work per die type for 14nm will be much more than it was at 28nm, it's sane to expect fewer designs even if that means being slightly less efficient per mm². A full lineup made of only two distinct dies (with plenty of harvested variants) might not be that insane.
AMD 2016 GPU lineup (from Oland up)
GDDR5 HBM2
————————————|——————————————————————————————
HBM2 4 stacks [450-500 mm^2]
HBM2 3 stacks [300-400 mm^2]
Hawaii
Tonga
Pitcairn HBM2 1 stack [125-150 mm^2]
Bonaire
Cape Verde
Oland
This spot targets MacBook Pro-type laptops which desire decent performance with small size and very low power consumption. Tonga and other parts remain an option for those whose first priority is low price.
I don't really expect the announced doubled frequency of HBM2 initially. Or at least I won't be surprised if it's just going to be somewhat higher than with HBM1. That said, such a chip should imho still be able to possibly replace both Tonga and Pitcairn.I'm expecting that 14nm mid-range with a single HBM2 stack to actually have a performance similar to a full Tonga. A single stack drives 256GB/s at 500MHz, which is more than the 176GB/s than the R9 380 has.and very close to the 260GB/s that a 384bit Tonga would achieve.
I've stated many times that I expect AMD to come up with architectural improvement for their next generation: they must have done *something* in the last 3 years. If all goes well, this new architecture will negate the requirement to use HBM, just the way it does for Nvidia.
I don't really expect the announced doubled frequency of HBM2 initially. Or at least I won't be surprised if it's just going to be somewhat higher than with HBM1. That said, such a chip should imho still be able to possibly replace both Tonga and Pitcairn.
They could use underclocked+undervolted single-stack HBM2 for a mobile chip with a performance target of GM107/Bonaire.(I don't think though replacing anything with Bonaire or below would be an option with such a chip, well maybe for mobile where AMD really needs all perf/w improvements they can possibly get.)
Eeerm.. is HBM2's increased bandwidth related to frequencies alone? I thought it as simply 2x wider for using 2x more stacks.
Do you know where I can read about that?
A single 8-stack HBM2 device could provide 8GB pool at 256GB/s throughput -- a perfect solution for a ultra-low power APU in a compact package.
The slide also says, HBM2 transfers 64 byte chunks of data instead of 32. So, something's fishy there.http://videocardz.com/55259/sk-hynix-shows-off-hbm1-and-hbm2
The sheet says HBM2 doubles the data rate, without touching the I/O prefetch, so it must be the interface clock going up.
p.s.: A single 8-stack HBM2 device could provide 8GB pool at 256GB/s throughput -- a perfect solution for a ultra-low power APU in a compact package.
Halve the Fiji GPU (32 CUs) + add four Zen cores (4 cores = 8 threads) on the same die + 8 GB of 256GB/s HBM2. Would make a perfect high end gaming laptop (with reduced GPU clocks to get better perf/watt). No discrete GPU required.A single 8-stack HBM2 device could provide 8GB pool at 256GB/s throughput -- a perfect solution for a ultra-low power APU in a compact package.
Halve the Fiji GPU (32 CUs) + add four Zen cores (4 cores = 8 threads) on the same die + 8 GB of 256GB/s HBM2. Would make a perfect high end gaming laptop (with reduced GPU clocks to get better perf/watt). No discrete GPU required.
Btw. How do the HBM2 latencies compare to DDR3/DDR4? CPU part needs low latency.