AMD: Pirate Islands (R* 3** series) Speculation/Rumor Thread

why? if rumors have some truth to it, massively more bandwidth and much reduced latency should help increase utilization, improvements to ALU's themselves and having a 500mm + of GPU with ~4k ALU's.
Even with 4k ALUs, that is just 40% more than a 290x. I don't think even with the massive increase in bandwidth and reduction in latency that they will get more than 50% increase in performance over the 290x. If tonga is anything to go by, they didn't significantly increase performance per ALU.
 
Even with 4k ALUs, that is just 40% more than a 290x. I don't think even with the massive increase in bandwidth and reduction in latency that they will get more than 50% increase in performance over the 290x. If tonga is anything to go by, they didn't significantly increase performance per ALU.

Well Tonga, at least on SP side is not different of hawaii.. this is an Hawaii based core with some additional features...( new compression algorythm + HSA )

This said 200% increase over the current generation seems ( to me ) way too much.
 
Last edited:
Even with 4k ALUs, that is just 40% more than a 290x. I don't think even with the massive increase in bandwidth and reduction in latency that they will get more than 50% increase in performance over the 290x. If tonga is anything to go by, they didn't significantly increase performance per ALU.
Why couldn't ~double bandwidth plus delta-compression result in 10%+ higher performance? (btw, Tonga has lower clocks and lower bandwidth than R9 280).
 
Even with 4k ALUs, that is just 40% more than a 290x. I don't think even with the massive increase in bandwidth and reduction in latency that they will get more than 50% increase in performance over the 290x. If tonga is anything to go by, they didn't significantly increase performance per ALU.

And how much did that reduce LATENCY by, how often is an ALU stalled? how much time will HBM reduce the time that ALU is stalled for thus increasing utilization. how often in a frame are you limited by the worse case memory fetch?

The idea that you need 100's of GB of bandwidth a second and yet being able to deliver that data at something like 1/2 the latency of GDDR5 wont have an improvement across the entire GPU pipeline as a layman doesn't compute to me. GPU's have very limited ability to hide latency which in turn has to have an effect on utilization.
 
Does HBM actually have reduced latency?
The latency of a high throughout DRAM controller due to the transaction sorter. HBM is fundamentally still the same storage technology as GDDR5, so I don't expect that to be very different.

Is that incorrect?
 
Unless they've found a massive amount of performance/somehow increase clocks by 25%+ along with that massive die that's got to be fake. Also don't records usually update online? Just checked and I don't think it's there. Unless of course there's something that benchmark particularly favours and AMD has found out what it is, so it's inflated although that's a load of rubbish on my part.
 
Compare to this result for 290CF: http://www.3dmark.com/3dm11/9151266

One thing is that the overall score is the same, but the individual sub tests are simply too close (within ~5%) - if one made such a jump in performance, you would expect much more variation in the performance gain in different workloads.
 
Sounds way too powerful considering the process hasn't changed.
For what it's worth, I think AMD has a lot more to gain from "delta compression" than NVidia did with its recent chips. I say that on the basis that NVidia's generally done more with less bandwidth for quite a while now.

Also, are we really expecting the next high-end chip from AMD to have HBM? It seemed to me that AMD's next chip is an interim chip with GDDR5. Towards the end of this year will be when the HBM chip arrives.

Well, that's what I'd convinced myself would happen.
 
And how much did that reduce LATENCY by, how often is an ALU stalled? how much time will HBM reduce the time that ALU is stalled for thus increasing utilization. how often in a frame are you limited by the worse case memory fetch?

The idea that you need 100's of GB of bandwidth a second and yet being able to deliver that data at something like 1/2 the latency of GDDR5 wont have an improvement across the entire GPU pipeline as a layman doesn't compute to me. GPU's have very limited ability to hide latency which in turn has to have an effect on utilization.

I've tried cross-referencing data between the various architectures in the console forum:
Xbox One (Durango) Technical hardware investigation

The DRAM itself seems to be a minority contributor to the cost of memory access for AMD's architectures (especially for AMD's architectures, many of these APU). CPU cache comparisons put the external DRAM device contribution at 60-80 CPU cycles, or 30-40 in terms of the GPU's slower clock speed.

Using the GPU clock regime, a probable best-case would be an on-die pool with dedicated read and write paths, which the ESRAM for the Xbox One would be. That's 250+ for DDR3 versus 125+ for ESRAM, once additive latencies from the caches you have to miss through are added in. I suspect that should HBM be an improvement, it may have difficulty dropping latency below an on-die SRAM pool.


Because HBM is DRAM on a read/write bus, it will still need to deal with the same coalescing requirements and the various latencies and penalties GDDR5 has for irregular traffic. Possibly, the larger number of buses can allow the memory controller more leeway on working around turnaround penalties. Maybe the slower bus and shorter wires of the interposer can reduce the bus turnaround penalties, at least as seen by the lower-clocked controller.
HBM has some high-level similarities in its banking (not that DRAM gives that much leeway for this) that point to AMD not wanting a memory technology too different from what has come before in GDDR5, so I think a general similarity to the DRAM it's replacing will probably mean it won't beat the ESRAM in latency--where ESRAM shows it's a nice but not earth-shattering improvement for the GPU from a vector memory standpoint.
 
I've heard from some "insider" that 390 will be unbelivably huge, of course this can be an hoax, but do you think it's possible that amd is using an interposer to arrange the hbm bus and this is causing confusion?
 
People have been confused by the extra board space allocated for the chip package on PCB schematics, so one's mileage may vary.
 
I've heard from some "insider" that 390 will be unbelivably huge, of course this can be an hoax, but do you think it's possible that amd is using an interposer to arrange the hbm bus and this is causing confusion?
Most recent rumors have suggested that AMD is finally using interposer to get them all nice and cosy close together
 
I've heard from some "insider" that 390 will be unbelivably huge, of course this can be an hoax, but do you think it's possible that amd is using an interposer to arrange the hbm bus and this is causing confusion?

Maybe. But Hawaii is already 438mm², so it pretty much has to be very big if it is meant to be significantly faster.
 
Back
Top