Why? We know it can clock at 1.4 GHz. Slap a water cooler on it and I think 30-40% over Titan X is within reach.Running a Titan X 40% faster seems to be a bit of a stretch.
Why? We know it can clock at 1.4 GHz. Slap a water cooler on it and I think 30-40% over Titan X is within reach.Running a Titan X 40% faster seems to be a bit of a stretch.
256 Bytes.With regards to HBM, does anyone know what the burst transfer length is? All I know is the width per stack is 1024bits.
TPU has Titan X performing at 43% faster than 290X. So if we scale up Hawaii's shader array to match Titan X performance, subtracting 30 W for memory interface savings, we get (1.43 * 300) - 30 = 400 W.
However, 290X is not the most power efficient Hawaii SKU - it gets 18.67 GFlops/W, while 295X2 gets 22.6 GFlops/W. So let's assume Fiji gets the same power efficiency as the best Hawaii GPUs. Then Fiji burns (1.43 * 300 * 18.67/22.6) - 30 = 325 W.
So, it seems that for Fiji to beat Titan X, it will need somewhere between 325-400 W (assuming AMD doesn't have other performance/watt improvements in their shader array).
I'm guessing that Fiji will be a 350 W part, and more or less match Titan X performance. I also think Nvidia will allow its partners to release various overclocked GM200 based parts that are somewhere between 15-40% faster than Titan X (at correspondingly increased power budgets).
With regards to HBM, does anyone know what the burst transfer length is? All I know is the width per stack is 1024bits.
256 Bytes.
from an nvidia slide said:Each HBM stack provides 8 independent memory channels
These are completely independent memory interfaces
Independent clocks & timing
Independent commands
Independent memory arrays
In short, nothing one channel does affects another channel
Hynix documentation points to that size (256B) for the access granularity. Is that different from the burst length?No. Each stack logically shows up as 8 independent channels, each providing 2-clock bursts, or 32 bytes per channel.
Thanks for the info, BTW do you have a link to said presentation?No. Each stack logically shows up as 8 independent channels, each providing 2-clock bursts, or 32 bytes per channel.
access granularity = bus width * burst lengthHynix documentation points to that size (256B) for the access granularity. Is that different from the burst length?
You're clearly new to the industry - both NVIDIA and AMD has done it several times before, no legal actions taken, ever.The mass rebranding - it is a fraud and a scam. That's enough to take legal action against AMD.
Most likely this one:Thanks for the info, BTW do you have a link to said presentation?
Hynix documentation points to that size (256B) for the access granularity. Is that different from the burst length?
Thanks for the info, BTW do you have a link to said presentation?
thememoryforum said:Mike O'Connor, NVIDIA and UT Austin, Some Highlights of the High-Bandwidth Memory (HBM) Standard
Abstract: The High-Bandwidth Memory (HBM) standard was recently finalized by JEDEC. This stacked-memory specification will enable significantly higher-bandwidth systems in the near future. This talk will present a brief overview of the HBM standard, focusing primarily on the interface with the host processor/memory controller. Aspects of the interface that are different than earlier DDR/GDDR memories, and some of the rationale for these new features, will be highlighted.
Bio: Mike O'Connor is a Senior Research Scientist at NVIDIA where his research focuses on future GPU processor and memory architectures. Mike previously worked at AMD Research, where, among other things, he was involved in many aspects of the development of the HBM standard (including writing the initial draft specification document). Prior to AMD, Mike was in the product architecture group at NVIDIA where he was the lead memory system architect for several generations of NVIDIA GPUs -- including the first NVIDIA GPUs with GDDR5 support. Mike has also architected network processors at start-up Silicon Access Networks, an ARM processor core at Texas Instruments, and the picoJava cores at Sun. Mike has been granted 40 patents. He has a BSEE from Rice University and an MSEE from the University of Texas at Austin. Mike is currently working towards finishing his long-delayed PhD at UT-Austin. He is a Senior Member of the IEEE and a member of the ACM.
The HBM DRAM is tightly coupled to the host compute die with a distributed interface. The interface is divided into independent channels. Each channel is completely independent of one another. Channels are not necessarily synchronous to each other.
...
2n prefetch architecture with 256 bits per memory read and write access
BL = 2 and 4
128 DQ width + Optional ECC pin support/channel
Cool! Good to see the cleared up!So the nV presentation is correct. Each individual access is 32 bytes.
There's also the Tonga-based R9 M295X that can do 2048 × 2 × 0.850 = 3,481.6 GFLOPS at 125W, or 27.85 GFLOPS/W. Granted, it's a mobile SKU running at a lower clock speed, but it gives you some idea of what might be possible with a lot of binning, and it's based on the latest iteration of GCN.
For what it's worth: 1.43 × 300 × 18.67/27.85 = 288W.
We discussed this a couple of times already, but as we near Computex things are getting more and more clear about AMD's Radeon line-up. Yes, there will be a new product with HBM memory, but it's not going to be called a R9 390 or 390X.
See, the AMD Radeon R9 390, R9 380, R9 370 and R9 360 Series will be respin products. New info makes it abundantly clear, the R9 390 for example, will be Hawaii based (R9 390). ASUS forums the following entries appeared:
We know for a fact that HBM is limited to 4 GB graphics memory so that 8GB series we see in the listing can't be Fiji or anything HBM (High Bandwith Memory) based, hence the one puzzle we needed to solve was what would AMD do with Hawaii ? Well, the 390 must be the Hawaii GPU refresh, yet tied to 8GB of graphics memory and likely a few tweaks on GPU and memory.
- ASUS R9390X-DC2-8GD5
- ASUS STRIX-R9380-OC-2GD5
- ASUS STRIX-R9370-OC-4GD5
- ASUS STRIX-R7360X-DC2OC2-2GD5
- ASUS R7360-2GD5
The AMD Radeon R9 380 is based on Tonga (Radeon R8 285) and has 1792 stream processors. The AMD Radeon R9 370 OEM “Pitcairn” and has 1024 stream processors (Radeon 7800/ R9 270X). Then the low-end R9 360 OEM (radeon 7790 / Radeon R9 260) is based on Bonaire and gets 768 stream processors. Hawaii is of course based on 2816 shader processors.
Reading more into the standard, it seems to me that HBM2 is a pure marketing name for higher speed/capacity class products. That is, the HBM standard is designed to be flexible enough to be used as-is for the next 5-10 years or so, and any gpu designed now should be able to fit the later, larger HBM chips when they become available. (But of course only being able to use them at the speed that their controller is able to.)