AMD: Speculation, Rumors, and Discussion (Archive)

Status
Not open for further replies.
227% more GPU throughput and 31% higher CPU clock, relying on only 23% bandwidth boost. Probably an inevitable compromise for a mid-term upgrade.
 
227% more GPU throughput and 31% higher CPU clock, relying on only 23% bandwidth boost. Probably an inevitable compromise for a mid-term upgrade.

Inevitably a compromise, whereby the latter's size depends on much less bandwidth "hungry" Polaris could be. In other words it could be an occassional compressed compromise ;)
 
227% more GPU throughput and 31% higher CPU clock, relying on only 23% bandwidth boost. Probably an inevitable compromise for a mid-term upgrade.
Raw GPU performance is less than 6 % higher than R9 380X, while the bandwidth is 20 % higher.
 
All of a sudden, Polaris 11 doesn't offer "console-level performance" at subnotebook form factor.

Crap, I had decided to be an early adopter this generation because it seemed promising enough.. First the initial line-up isn't that great, then the Plus becomes crap, and finally, during the year when the really interesting exclusives start appearing (Uncharted 4, FF VII R, PSVR, Horizon Zero Dawn, Persona 5, etc.) Sony launches a new console for which all these high-profile new games/platforms will certainly take advantage of.

Meh...

Another interesting metric is that this NEW(O) PS4, if these specs are right, has over 3x the GPU performance of the XBone.
 
Last edited by a moderator:
I'm guessing MS wont just follow but try to leapfrog it?
Leapfrogging Sony with their 32MB SRAM baggage they have to haul around for compatibility is going to be fairly expensive.

227% more GPU throughput and 31% higher CPU clock, relying on only 23% bandwidth boost.
Sourcing high-speed GDDR for a console potentially selling millions per quarter would probably be a significant headache, so they're settling again for middle-of-the-road-ish GDDR. :)
 
http://www.giantbomb.com/articles/sources-the-upgraded-playstation-4-is-codenamed-ne/1100-5437/
Polaris 10

Improved AMD GCN, 36 CUs at 911 MHz

So the new semi custom wins, are the updated PS4 and Nintendo NX. Just replacements or upgrades to consoles.

There is still one semicustom win with a missing customer (Apple?,MS?). They win two in 2014 and one mid last year.

One interesting thing about this is AMD is going to shrink Puma die cores to 14nm. Is this the first news we have about this?.
 
Last edited:
One interesting thing about this is AMD is going to shrink Puma die cores to 14nm. Is this the first news we have about this?.
In fact, not the first news.

AMD was working on Tiger (16nm FF) a Cheetah (14nm FF) projects at least in 2014 and 2015, respectively. The codenames fit to the AMD's feline cores line pretty good.

However, in the past there were many scrapped project in this particular family - for instance Lion (28nm BT?), Margay or Leopard.
 
I don't think the scalar unit in GCN has much to do with AMD's Cat cores.

On that topic, some of the proposed changes for SIMD or wavefront handling in the patents might have implications for GCN as we know it. Items like the high-performance vector/scalar units could break code that has baked in wait states or additional vector operations for some of the aliased registers and sourcing between the scalar and vector domains.

A high-performance unit that breaks the 4-cycle cadence makes the vector path seem faster relative to the scalar path, and the variable-SIMD patent assumes a tighter latency for updating the front end with predication information. The former could cause code to misbehave unless the CU's domain-crossing penalties are reduced or properly interlocked. The latter would be more transparent since it speeds up forwarding of information.

Vector ops that source from a scalar register or from the LDS start to look different too, if the CU dynamically puts them on a higher issue rate. It wouldn't be so much for correctness as contention for shared resources if they remain shared at the same ratios and capabilities they have now.
 
On that topic, some of the proposed changes for SIMD or wavefront handling in the patents might have implications for GCN as we know it. Items like the high-performance vector/scalar units could break code that has baked in wait states or additional vector operations for some of the aliased registers and sourcing between the scalar and vector domains.
Maybe, but that should still largely fall on the compiler to handle and detect. Just opt out of using it unless specifically called. The basic sync primitive they had seemed capable. That flexible scalar is a bit more robust than the wave level operations of SM6.0 I was expecting. The possibilities it opens likely make it worth reworking any compute code as the models would change dramatically and be easier to implement.

No reason you couldn't have a scalar for the CU managing all the scheduling. Dynamically creating waves and rearranging as necessary. Bonus points for an ARM core that could do all the CUs at once. Didn't they mention something like that as a cost saving measure for internal testing?
 
Last edited:
Maybe, but that should still largely fall on the compiler to handle and detect. Just opt out of using it unless specifically called. The basic sync primitive they had seemed capable. That flexible scalar is a bit more robust than the wave level operations of SM6.0 I was expecting. The possibilities it opens likely make it worth reworking any compute code as the models would change dramatically and be easier to implement.
There are wait counts for the various memory accesses and exports, but NOP or independent instruction padding handles the class of operations that involve sourcing or forwarding between vector and scalar elements. It would be an unenviable bug to have to trace through for a problem that dynamically may or may not occur based on what SIMD the scheduler chooses. The alternative can be an excessively pessimistic compiler that inserts enough NOPs to satisfy the worst case where a wait state of several 4-cycle vector issues gets multiplied by a 4x scalar issue rate.

There's another race condition for flat addressing that is incrementally more serializing with the indirection through the LDS done with the wavefront reformation proposal.
 
Raw GPU performance is less than 6 % higher than R9 380X, while the bandwidth is 20 % higher.

Can't compare bandwidth with desktop parts as desktops don't have to deal with memory contention like the PS4 has to. The GDDR5 on PS4 has to service both the CPU and GPU effective bandwidth for the PS4 will generally be lower than the number stated.

Regards,
SB
 
Status
Not open for further replies.
Back
Top