AMD: Pirate Islands (R* 3** series) Speculation/Rumor Thread

I think its a nice addition even if it is not being used to its full potential. The bottleneck is going to have to be somewhere and if bandwidth bottleneck can be ignored, the whole system will perform better in some cases where bandwidth does matter. There is likely 128 ROPs on Fiji and that will need the bandwidth if you are pushing higher resolutions I'd imagine.
Let's not forget that it's the CUs that send commands to the ROPs.

We can assume the latency improvements might also be good as well.
I had to google that one and only came up with this: http://www.memcon.com/pdfs/proceedings2014/NET104.pdf: for HBM1, the 'tFAW' has been reduced by 33% compared to DDR4. (No clue what tFAW is for GDDR5, but probably similar.) That's excellent news for agents that need low latency access to random pages, such as CPUs. But for a bandwidth oriented controller, the memory access latency will most likely be determined by the page sorter instead of the lower level characteristics of the DRAM. I don't think it's a coincidence that, in the Hynix slides, latency is mentioned as a benefit for HPC, networking, and data center, but not for GPUs.

This is all on top of a power reduction too. I think HBM is a great technology to be implementing even if its not being pushed to max potential.
An interesting part of the wccftech article: they praise AMD for managing the power consumption of 390 to be the same as 290X, but they overlook the power benefits of HBM. One has to assume that AMD made power improvements to non-MC part of Fiji as well, but after playing the trump card of HBM, they're still 40W behind.

AMD likes higher density dies to larger dies. Its the same with hawaii. They are already at the power limit they are comfortable at, adding more CU will likely not improve efficiency or performance by much at that point. They save some money by making the die smaller and clocking it slightly higher to get the same performance out of the same power budget as a larger chip.
Clocking higher than who?
But yes, their architecture consumes too much power to allow adding more CUs.

From a consumer point of view, these Fiji results are ok, as long as retail price plays its part. From a technical and commercial point of view, they should be troubling for AMD: despite using the most advanced memory technology, they only match the gm200 in performance, yet their product is more expensive to produce (cooling, memory.)
 
Then in the fall when they might actually be able to get a 16nm gpu out there go full out with the die drop and HBM .
I love your optimism, but there's no such thing as high-end filler chips, and quick cost reductions. Not even for a company that's flush with money like AMD.
 
What's the point of having unlimited bandwidth when you don't have the computing resources to use it? HBM and 28nm don't mix well.
It's why I keep saying that I doubt it's HBM, yet that new slide indicates it does have HBM. So, it looks like an R600 re-run.

HBM at ~512GB/s and Tonga style bandwidth efficiency should be in the region of 100% faster than Hawaii, if the overall design is balanced properly, e.g. with a count of say 96 CUs and 128 ROPs.

Perhaps we have to conclude that AMD decided to go with HBM despite not having access to a node that's better than 28nm.

Can GCN scale up to 96 CUs, or is 64 the practical limit? Can GCN's cache architecture scale up to HBM?

Is AMD planning to put L3/ROPs/buffer caches in logic at the base of each stack, which happens on the next process node?
 
R9 290X has 352GB/s in BW. A GTX 980, which outperforms the 290X quite easily most of the time, has 224 GB/s. Now I know that the 980 has compression etc, but I don't think that will compensate for a 57% difference in BW.
The R9 290X uses 5000 MT/s memory on a 512bit bus, so it's 320GB/s and a 43% difference in memory bandwidth.

But why aren't you saying the exact same thing between the 780/780Ti and the 980?
The 780 Ti has a 336 GB/s, so >50% more bandwidth than the 980, yet the 980 beats the 780 Ti as often as it beats the R9 290X.
Does that mean the 780 Ti is a waste of resources?

Regardless, the 290X and the 780 Ti are less bandwidth constrained than the 980, and it does show in some "extreme" situations:

jHtEj9t.jpg
ymROyeY.jpg
LwacH98.jpg




Now you could argue that nVidia made a better adjustment of its bandwidth regarding its performance in more realistic usage scenarios than what it did for the 780 Ti and AMD for the R9 290X, and that's fine. But although the GM204 may surpass Hawaii and GK110 in performance, the chip wasn't designed for the same segment.
Hawaii, Fiji, GK110 and GM200 were designed as top-performance chips with less boundaries regarding power consumption, transistor count, PCB's BOM, etc.
That's why Fiji will be using HBM: because it can.
 
Contrast that with Dying Light, where 980GTX is 28% faster than 290X at 2560x1440:

http://www.hardocp.com/image.html?image=MTQyNTg4MzI2N3BZMUROcVhHOWlfNF80X2wuZ2lm

GM204's architecture is clearly smarter, and that trumps brute bandwidth.

Sure, every generation sees games that favour one or the other architecture. And bandwidth efficiency is usually a big winner with each architecture iteration, so we're comparing an iteration of NVidia's architecture to AMD's prior gen.

Maybe for AMD, HBM is cheaper, overall, than a traditional GDDR5 chip at this performance level. Seems unlikely though. Die area saved versus the cost of the interposer. Are AIBs going to eat the cost of HBM memory? Is 8GB of HBM going to be cheaper than 12GB of GDDR5?

Is a radically smaller PCB for Fiji going to save money? Oh, hang on, the "leaked" pix of the cooler indicate a PCB that's the same, kinda 300mm, size (though I have to admit, I can't think why). Maybe it has substantially less layers as all the complexity is on package, now.

What about simplified power circuitry? Is HBM simpler? Seems unlikely, as the power levels are unchanged.

Or perhaps there's some class of games/compute where HBM is going to enable a smack-down. Again, it seems unlikely. AMD's never seen its compute dominance turn into market share and that's been consistently the case for more than 10 years. So whatever corner case that HBM enables isn't going to make games radically better on AMD than NVidia within the next 18 months, because more raw bandwidth at moderately less power doesn't bring the balance that AMD needs, with Hawaii as the baseline.

The only opening I can see here is that 390X utterly defeats GTX 980Ti (cut-down GM200) and NVidia is forced to compete by pricing Titan X lower than it might have expected. Months after launch, when 390X finally appears.

I just don't see how HBM is needed to enable AMD to do that. And HBM appears to be the cause of significant delays, if we're now talking about May or later for 8GB cards, which are definitely required for 3840x2160 gaming.

To be fair, timing the introduction of HBM, which is a project that must have been running for years now, is going to be seriously difficult. Interposer + HBM are two big, risky, changes that can't arrive independently in discrete graphics. Being stuck on 28nm is a nightmare. But it's not the first time that discrete graphics has been stuck on a process, so AMD knew it was likely.

Anyway, dropping-in HBM to compete against a non-HBM architecture seems like solving the wrong problem.
 
The R9 290X uses 5000 MT/s memory on a 512bit bus, so it's 320GB/s and a 43% difference in memory bandwidth.
Good. That changes my 57% to 43%. But also the increases the disparity between CU and HBM changes even more. :)

But why aren't you saying the exact same thing between the 780/780Ti and the 980?
The 780 Ti has a 336 GB/s, so >50% more bandwidth than the 980, yet the 980 beats the 780 Ti as often as it beats the R9 290X.
Does that mean the 780 Ti is a waste of resources?
I wouldn't call the 780Ti as a product a waste of resources. But in hindsight, yes, it's way more wasteful with BW resources. And, indeed, the big X factor here is whether AMD is going to pull a Maxwell-like perf/mm2 rabbit out of its hat. If their CUs have significantly increased efficiency, then all our musings here are wrong. But the Chiphell numbers are all we can go by, and they suggest that this won't be the case.

Regardless, the 290X and the 780 Ti are less bandwidth constrained than the 980, and it does show in some "extreme" situations:
It's to be expected that this would show up in some case, no need for fancy graphs. It also strengthens my point that BW was already less of concern for the 290X.

That's why Fiji will be using HBM: because it can.
And Nvidia will laugh its way to the bank with a much cheaper solution that performs just the same.
 
Maybe for AMD, HBM is cheaper, overall, than a traditional GDDR5 chip at this performance level. Seems unlikely though. Die area saved versus the cost of the interposer. Are AIBs going to eat the cost of HBM memory? Is 8GB of HBM going to be cheaper than 12GB of GDDR5?
How about 12GB vs 4GB? The whole HBM solution is a significant jump in complexity and the volumes are going to be low for quite a while.

Or perhaps there's some class of games/compute where HBM is going to enable a smack-down.
Just like there are games where the GTX980 can't match a GTX780 or an R9 290X, there's going to be games where the 390 will be way ahead. Probably by a larger factor than we're used to see. I don't think AMD has designed their chip such that the peak BW can't theoretically be reached. ;-)

To be fair, timing the introduction of HBM, which is a project that must have been running for years now, is going to be seriously difficult. Interposer + HBM are two big, risky, changes that can't arrive independently in discrete graphics. Being stuck on 28nm is a nightmare. But it's not the first time that discrete graphics has been stuck on a process, so AMD knew it was likely.
Yes. Don't get me wrong: I think HBM is awesome. Let's also not forget that AMD designed Fiji while Nvidia designed Maxwell. With Kepler and first gen GCN roughly similar in performance and efficiency, it would not be a bad guess for AMD to expect Nvidia to make only marginal improvements for their next 28nm chip. It's entirely possible that AMD got blindsided by the improvements of Maxwell. Without those improvements, Fiji (based on the data we currently have!) would have looked fabulous. Maybe it still will...
 
And, indeed, the big X factor here is whether AMD is going to pull a Maxwell-like perf/mm2 rabbit out of its hat.
What is so special on Maxwell's performance per mm²? E.g. GM206 / GTX 960 (227 mm², 1178 MHz) is 9-14 % faster than Curacao / R9 270X (212 mm², 1050 MHz), which is clocked 12 % lower. Comparision of GTX 980 / R9 290X seems to be similar, the main difference in performance is related to higher clock-speed. GTX 980 (398 mm²) runs ~35 % faster (real clock) to provide 12-16 % higher gaming performance. At the same clocks R9 290X (438 mm²) is significantly faster than GTX 980. AMD has very good performance per square milimeter, its only problem is high power consumption.
 
For what it's worth, the slide shown in the previous page mentions that this is the "first ever GPU designed for VR immersion". This might just be marketing bullshit, of course, but I wonder if there's something to it. Maybe they're targeting 4K HMDs with SSAA or something. In that case, HBM might make a big difference.
 
GTX 980 (398 mm²) runs ~35 % faster (real clock) to provide 12-16 % higher gaming performance. At the same clocks R9 290X (438 mm²) is significantly faster than GTX 980. AMD has very good performance per square milimeter, its only problem is high power consumption.
In other words: GCN and Maxwell have the same perf/mm2 except for two critical factors that determine performance. :D
 
For what it's worth, the slide shown in the previous page mentions that this is the "first ever GPU designed for VR immersion". This might just be marketing bullshit, of course, but I wonder if there's something to it. Maybe they're targeting 4K HMDs with SSAA or something. In that case, HBM might make a big difference.
Isn't that the same BS that marketing (on both sides) spreads with each new generation? "Designed for 4K" is little more than "It just runs faster." The groundwork of Fiji must have started long before the VR craze really took off. As sebbi pointed out in a different thread, a much better 'made for VR' case could be made for Vulcan/DX12.
 
Isn't that the same BS that marketing (on both sides) spreads with each new generation? "Designed for 4K" is little more than "It just runs faster." The groundwork of Fiji must have started long before the VR craze really took off. As sebbi pointed out in a different thread, a much better 'made for VR' case could be made for Vulcan/DX12.

Yeah, it's probably a better-sounding way of saying "it's really fast, you guys" but who knows, there might be something more. I suppose you could argue that 128 ROPs would qualify as somewhat VR-specific.
 
I would also say that "designed for 4K" means a higher ROP ratio than previous chips.
 
I would also say that "designed for 4K" means a higher ROP ratio than previous chips.
Sure. There's always going to be some grain of truth to justify the claim. Those extra CUs on Fiji are going to help VR as well, and since the 4K song has already been played to death, it was the logical follow up.
 
To be honest, they could easily double down on 4K (and probably triple-down for the next generation) because even the 290X and GTX 980 still struggle with a lot of games at that definition.
 
What is so special on Maxwell's performance per mm²? E.g. GM206 / GTX 960 (227 mm², 1178 MHz) is 9-14 % faster than Curacao / R9 270X (212 mm², 1050 MHz), which is clocked 12 % lower. Comparision of GTX 980 / R9 290X seems to be similar, the main difference in performance is related to higher clock-speed. GTX 980 (398 mm²) runs ~35 % faster (real clock) to provide 12-16 % higher gaming performance. At the same clocks R9 290X (438 mm²) is significantly faster than GTX 980. AMD has very good performance per square milimeter, its only problem is high power consumption.

Which has an exacerbated appearance as NVIDIA's reported TDP's aren't actually what any of their cards run at during peak. So even there it's not nearly as bad for AMD. To add onto that Maxwell ala a 980 have framebuffer compression, same as AMD has now, making its bandwidth much higher in effect than the bus X memory speed would let it appear over a card without delta color compression. The Titan/780ti weren't particularly wasteful at all with bandwidth, the 980 just gets to get away with a smaller bus because it compresses.

That a 390x with 640gbs bandwidth, and assumedly delta color compression as well, will be unbalanced between compute and bandwidth seems a given. But making assumptions that HBM is way more expensive than GDDR5 is unwarranted. The R&D costs were certainly large, but there's no indication that the fab costs are grievously more. It's still silicon, same material, assumedly roughly the same patterning and etc. just with clever engineering to make it stack. And with the lower power consumption that goes a long way towards bringing AMD closer to Nvidia's perf/per watt. Sure you can nitpick over how each got there and which is "better" but customers aren't going to give a damn.

But just because the card is unbalanced doesn't mean there's any reason whatsoever to go back to a single 512bit bus and GDDR5. If, in total, HBM delivers enough benefit for the cost then it's worth it, and any extra bandwidth is just too much of a good thing, not something to fret over.
 
Which has an exacerbated appearance as NVIDIA's reported TDP's aren't actually what any of their cards run at during peak. So even there it's not nearly as bad for AMD.
I'm not talking TDPs here. I'm talking actual power, playing real games, as measured by various websites such as hardware.fr, that depending on which data you cherrypick, show a difference between a 290X and a 980 from 35% to 100%. Even a difference of 35% is already past the 'not nearly as bad' territory, but it's usually much more than that.

But making assumptions that HBM is way more expensive than GDDR5 is unwarranted. The R&D costs were certainly large, but there's no indication that the fab costs are grievously more. It's still silicon, same material, assumedly roughly the same patterning and etc. just with clever engineering to make it stack.
I don't have any concrete indications of R&D or fab costs. All I have are these known facts:
On one hand, you have a technology that's been around for years. Using a totally conventional mass production packaging technology. With multiple vendors offering the same product. Sold at high volumes.
On the other, you have a technology that only recently seen its appearance in a product catalog. That uses a seriously complex TSV technology to bond 5 dies together. That's offered by exactly one vendor. That currently has a volume of 0, with no major volume to expected in the coming year.

This is about fab costs. And packaging costs. And, most important, of a supplier who's in a rare position (for a DRAM manufacturer) to extract meaningful margins from a product.

And that's just for the component itself. For AMD, there's the costly interposer. And all they get in return are some lousy savings on commodity PCB technology.

The only thing under debate is the ratio by which HBM is more expensive that GDDR5.

And with the lower power consumption that goes a long way towards bringing AMD closer to Nvidia's perf/per watt.
That's true. The Chiphell numbers show the difference narrow from 110W (290X vs 980) to 'only' 37W (390 vs Titan X). Yay for AMD!
I definitely hope that Fiji supports GDDR5 as well, so we can make some comparisons.

Sure you can nitpick over how each got there and which is "better" but customers aren't going to give a damn.
Sure. But this is the architecture, not the purchase decisions forum.

But just because the card is unbalanced doesn't mean there's any reason whatsoever to go back to a single 512bit bus and GDDR5. If, in total, HBM delivers enough benefit for the cost then it's worth it, and any extra bandwidth is just too much of a good thing, not something to fret over.
The whole point of Jawed and me is that HBM doesn't provide the benefit / cost improvement for 28nm. ;-)
 
On top of that, they are including an AIO cooler as their "reference" design which is somewhat alarming (if the rumors are true of course). The leaked benchmarks indicate that the power consumption numbers are better or about the same to that of a R290X yet I can't see why the need for such cooling solution when it adds more cost and risk..
 
On top of that, they are including an AIO cooler as their "reference" design which is somewhat alarming (if the rumors are true of course). The leaked benchmarks indicate that the power consumption numbers are better or about the same to that of a R290X yet I can't see why the need for such cooling solution when it adds more cost and risk..

Don't know why they're doing that cooler design, maybe it's proven to be really quiet and cost effective for the 295x? Certainly haven't heard any big news stories about it failing.

But as for Hynix, HBM was developed in direct co-operation with AMD. I don't doubt the contracts include fabbing it for AMD at minimal profit margin for Hynix, who probably plan to make their money off selling it to other IHVs at much higher cost. Making the actual cost of including it in the 390 a big question mark. Besides, GDDR5 has always been specialized and reserved for GPUs. It's regular DDR that gets the big economies of scale benefits, I've always understood GDDR5 to be quite a bit more expensive.

And while the interposer needed for HBM might be costly, the traditional ram bus on GPUs is also costly. Maybe in total HBM with an interposer doesn't add much if any more to silicon costs in that way. Leaving AMD with a possible net win of lower TDP and higher bandwidth for not much additional cost.
 
Back
Top