AMD: Pirate Islands (R* 3** series) Speculation/Rumor Thread

TPU has Titan X performing at 43% faster than 290X. So if we scale up Hawaii's shader array to match Titan X performance, subtracting 30 W for memory interface savings, we get (1.43 * 300) - 30 = 400 W.

However, 290X is not the most power efficient Hawaii SKU - it gets 18.67 GFlops/W, while 295X2 gets 22.6 GFlops/W. So let's assume Fiji gets the same power efficiency as the best Hawaii GPUs. Then Fiji burns (1.43 * 300 * 18.67/22.6) - 30 = 325 W.

So, it seems that for Fiji to beat Titan X, it will need somewhere between 325-400 W (assuming AMD doesn't have other performance/watt improvements in their shader array).

I'm guessing that Fiji will be a 350 W part, and more or less match Titan X performance. I also think Nvidia will allow its partners to release various overclocked GM200 based parts that are somewhere between 15-40% faster than Titan X (at correspondingly increased power budgets).

There's also the Tonga-based R9 M295X that can do 2048 × 2 × 0.850 = 3,481.6 GFLOPS at 125W, or 27.85 GFLOPS/W. Granted, it's a mobile SKU running at a lower clock speed, but it gives you some idea of what might be possible with a lot of binning, and it's based on the latest iteration of GCN.

For what it's worth: 1.43 × 300 × 18.67/27.85 = 288W.
 
Recent listings of the R9 390X with 8GB are leading some to believe the 390 series will in fact be Hawaii, with Fiji having a different name away from the Rx series (like Titan).
Perhaps the best proof of this is that there's a consumer (Asus STRIX) 2GB R9 380 model in there. I don't think they'd ever release a Hawaii with 2GB, so 380 has to be Tonga.
This would mean that 3x0 series = 2x0 series, being synched with the recent OEM announcements (R9 380 = cut-down Tonga).

Not only has the execution been rather bad after the introduction of GCN's range 3,5 years ago (AMD has released 3 newer GPUs since then), but the people responsible for the naming schemes seem to be going through a constant flux of brainfarts.

I'd prefer to believe that Fiji will have 8GB versions, but that is getting less and less likely every day (to me, Macri's interview on Techreport with his pretty mediocre "we got a couple of dudes working on it" excuses was pretty much confirmation that 2015's Fiji will only have 4GB).
And by June, the almost 2 year-old Hawaii rebrand will have twice the VRAM of AMD's top-end offering.

GPUs releases have been slow and delayed. CPU releases have pretty much stagnated since Vishera's 2012 initial release. Performance APUs have had 2 distinct generations since 2012 (Trinity and Kaveri, both with lukewarm receptions) and Carrizo seems delayed to death... What the hell?!

And the worst offender will definitely be Pitcairn in the 300 series, a GPU that will lack any of the features from GCN >1.0 GPUs like TrueAudio and Freesync.
Lacking FreeSync is downright terrible IMO. It should be huge playability boost for slower GPUs and Radeon 370 users will lack it.
 
With regards to HBM, does anyone know what the burst transfer length is? All I know is the width per stack is 1024bits.
256 Bytes.

No. Each stack logically shows up as 8 independent channels, each providing 2-clock bursts, or 32 bytes per channel.

from an nvidia slide said:
Each HBM stack provides 8 independent memory channels
These are completely independent memory interfaces
Independent clocks & timing
Independent commands
Independent memory arrays
In short, nothing one channel does affects another channel
 
No. Each stack logically shows up as 8 independent channels, each providing 2-clock bursts, or 32 bytes per channel.
Thanks for the info, BTW do you have a link to said presentation?
Hynix documentation points to that size (256B) for the access granularity. Is that different from the burst length?
access granularity = bus width * burst length
8 channels * 32bytes per channel = 256bytes
 
The mass rebranding - it is a fraud and a scam. That's enough to take legal action against AMD.
You're clearly new to the industry - both NVIDIA and AMD has done it several times before, no legal actions taken, ever.
There's nothing "fraud" and "scam" about it as long as you tell the specs of the cards as they are.
Annoying, bad and whatnot? Yes. Illegal? Definitely not.
 
Hynix documentation points to that size (256B) for the access granularity. Is that different from the burst length?

No. Access granularity should mean the smallest individual access you can make, and based on what I understand about HBM, it should be 32B. 256B is what you get if you bind all the command lines of the channels of a stack together and access them in parallel, but who would actually do that?

Thanks for the info, BTW do you have a link to said presentation?

My info is from: http://www.cs.utah.edu/thememoryforum/mike.pdf

thememoryforum said:
Mike O'Connor, NVIDIA and UT Austin, Some Highlights of the High-Bandwidth Memory (HBM) Standard
Abstract: The High-Bandwidth Memory (HBM) standard was recently finalized by JEDEC. This stacked-memory specification will enable significantly higher-bandwidth systems in the near future. This talk will present a brief overview of the HBM standard, focusing primarily on the interface with the host processor/memory controller. Aspects of the interface that are different than earlier DDR/GDDR memories, and some of the rationale for these new features, will be highlighted.
Bio: Mike O'Connor is a Senior Research Scientist at NVIDIA where his research focuses on future GPU processor and memory architectures. Mike previously worked at AMD Research, where, among other things, he was involved in many aspects of the development of the HBM standard (including writing the initial draft specification document). Prior to AMD, Mike was in the product architecture group at NVIDIA where he was the lead memory system architect for several generations of NVIDIA GPUs -- including the first NVIDIA GPUs with GDDR5 support. Mike has also architected network processors at start-up Silicon Access Networks, an ARM processor core at Texas Instruments, and the picoJava cores at Sun. Mike has been granted 40 patents. He has a BSEE from Rice University and an MSEE from the University of Texas at Austin. Mike is currently working towards finishing his long-delayed PhD at UT-Austin. He is a Senior Member of the IEEE and a member of the ACM.

I would trust that presentation over a random marketing slide.
 
Since the hynix slide confused me, I just downloaded JESD235 and had a look. The standard clearly states:

The HBM DRAM is tightly coupled to the host compute die with a distributed interface. The interface is divided into independent channels. Each channel is completely independent of one another. Channels are not necessarily synchronous to each other.

...

2n prefetch architecture with 256 bits per memory read and write access
BL = 2 and 4
128 DQ width + Optional ECC pin support/channel

So the nV presentation is correct. Each individual access is 32 bytes.
 
There's also the Tonga-based R9 M295X that can do 2048 × 2 × 0.850 = 3,481.6 GFLOPS at 125W, or 27.85 GFLOPS/W. Granted, it's a mobile SKU running at a lower clock speed, but it gives you some idea of what might be possible with a lot of binning, and it's based on the latest iteration of GCN.

For what it's worth: 1.43 × 300 × 18.67/27.85 = 288W.

This is true, but R9 M295X is a very low volume, niche product. The mainstream Tonga part (R9 285) gets 17.4 GFlops/W - worse than Hawaii. 1.43*300*18.67/17.4 = 460W.

I'm guessing Fiji will get somewhere around 22-23 GFlops/W.
 
Reading more into the standard, it seems to me that HBM2 is a pure marketing name for higher speed/capacity class products. That is, the HBM standard is designed to be flexible enough to be used as-is for the next 5-10 years or so, and any gpu designed now should be able to fit the later, larger HBM chips when they become available. (But of course only being able to use them at the speed that their controller is able to.)
 
We discussed this a couple of times already, but as we near Computex things are getting more and more clear about AMD's Radeon line-up. Yes, there will be a new product with HBM memory, but it's not going to be called a R9 390 or 390X.

See, the AMD Radeon R9 390, R9 380, R9 370 and R9 360 Series will be respin products. New info makes it abundantly clear, the R9 390 for example, will be Hawaii based (R9 390). ASUS forums the following entries appeared:
  • ASUS R9390X-DC2-8GD5
  • ASUS STRIX-R9380-OC-2GD5
  • ASUS STRIX-R9370-OC-4GD5
  • ASUS STRIX-R7360X-DC2OC2-2GD5
  • ASUS R7360-2GD5
We know for a fact that HBM is limited to 4 GB graphics memory so that 8GB series we see in the listing can't be Fiji or anything HBM (High Bandwith Memory) based, hence the one puzzle we needed to solve was what would AMD do with Hawaii ? Well, the 390 must be the Hawaii GPU refresh, yet tied to 8GB of graphics memory and likely a few tweaks on GPU and memory.

The AMD Radeon R9 380 is based on Tonga (Radeon R8 285) and has 1792 stream processors. The AMD Radeon R9 370 OEM “Pitcairn” and has 1024 stream processors (Radeon 7800/ R9 270X). Then the low-end R9 360 OEM (radeon 7790 / Radeon R9 260) is based on Bonaire and gets 768 stream processors. Hawaii is of course based on 2816 shader processors.

http://www.guru3d.com/news-story/amd-radeon-r9-390r9-380r9-370-and-r9-360-series-rebrands.html
 
Reading more into the standard, it seems to me that HBM2 is a pure marketing name for higher speed/capacity class products. That is, the HBM standard is designed to be flexible enough to be used as-is for the next 5-10 years or so, and any gpu designed now should be able to fit the later, larger HBM chips when they become available. (But of course only being able to use them at the speed that their controller is able to.)

There's a lot of text to go through, but one thing that may show up outside of the spec as it stands is the proposed pseudo-channel mode in HBM2.
That modifies what the stack does at a half-channel granularity, by sending more commands to the stack than is necessary for a burst length of 4.
 
Back
Top