AMD: Speculation, Rumors, and Discussion (Archive)

fellix · Apr 19, 2016

227% more GPU throughput and 31% higher CPU clock, relying on only 23% bandwidth boost. Probably an inevitable compromise for a mid-term upgrade.

Ailuros · Apr 19, 2016

fellix said:
227% more GPU throughput and 31% higher CPU clock, relying on only 23% bandwidth boost. Probably an inevitable compromise for a mid-term upgrade.

Inevitably a compromise, whereby the latter's size depends on much less bandwidth "hungry" Polaris could be. In other words it could be an occassional compressed compromise

fellix · Apr 19, 2016

Ailuros said:
Inevitably a compromise, whereby the latter's size depends on much less bandwidth "hungry" Polaris could be. In other words it could be an occassional compressed compromise

That better be one hell of an efficient framebuffer compression.

Ailuros · Apr 19, 2016

"One hell" is a magnitude too high for my usual expectations

DuckThor Evil · Apr 19, 2016

fellix said:
227% more GPU throughput and 31% higher CPU clock, relying on only 23% bandwidth boost. Probably an inevitable compromise for a mid-term upgrade.

127% more GPU Throughput.

no-X · Apr 19, 2016

fellix said:
227% more GPU throughput and 31% higher CPU clock, relying on only 23% bandwidth boost. Probably an inevitable compromise for a mid-term upgrade.

Raw GPU performance is less than 6 % higher than R9 380X, while the bandwidth is 20 % higher.

Deleted member 13524 · Apr 19, 2016

All of a sudden, Polaris 11 doesn't offer "console-level performance" at subnotebook form factor.

Crap, I had decided to be an early adopter this generation because it seemed promising enough.. First the initial line-up isn't that great, then the Plus becomes crap, and finally, during the year when the really interesting exclusives start appearing (Uncharted 4, FF VII R, PSVR, Horizon Zero Dawn, Persona 5, etc.) Sony launches a new console for which all these high-profile new games/platforms will certainly take advantage of.

Meh...

Another interesting metric is that this NEW(O) PS4, if these specs are right, has over 3x the GPU performance of the XBone.

Grall · Apr 19, 2016

SimBy said:
I'm guessing MS wont just follow but try to leapfrog it?

Leapfrogging Sony with their 32MB SRAM baggage they have to haul around for compatibility is going to be fairly expensive.

fellix said:
227% more GPU throughput and 31% higher CPU clock, relying on only 23% bandwidth boost.

Sourcing high-speed GDDR for a console potentially selling millions per quarter would probably be a significant headache, so they're settling again for middle-of-the-road-ish GDDR.

Razor1 · Apr 19, 2016

Ailuros said:
Colour me impressed for more than twice the added GPU power. I assume Microsoft will also follow pace?

Hoping the same thing!

Kaotik · Apr 19, 2016

Ailuros said:
Colour me impressed for more than twice the added GPU power. I assume Microsoft will also follow pace?

MS has so far declined any suggestions of "more powefull xbox1.5" or similar

Love_In_Rio · Apr 19, 2016

Razor1 said:
http://www.giantbomb.com/articles/sources-the-upgraded-playstation-4-is-codenamed-ne/1100-5437/
Polaris 10

Improved AMD GCN, 36 CUs at 911 MHz

So the new semi custom wins, are the updated PS4 and Nintendo NX. Just replacements or upgrades to consoles.

There is still one semicustom win with a missing customer (Apple?,MS?). They win two in 2014 and one mid last year.

One interesting thing about this is AMD is going to shrink Puma die cores to 14nm. Is this the first news we have about this?.

Anarchist4000 · Apr 19, 2016

Kaotik said:
MS has so far declined any suggestions of "more powefull xbox1.5" or similar

As expected until it releases. Pretty sure there were some FCC filings about new MS console parts the other day.

Love_In_Rio said:
There is still one semicustom win with a missing customer (Apple?,MS?).

Microsoft would be the obvious one. What semi-custom parts would Apple need? TV? Maybe a VR headset?

Love_In_Rio · Apr 19, 2016

Anarchist4000 said:
As expected until it releases. Pretty sure there were some FCC filings about new MS console parts the other day.

Microsoft would be the obvious one. What semi-custom parts would Apple need? TV? Maybe a VR headset?

Rumor was related to iMacs:

http://wccftech.com/amd-making-custom-x86-soc-apple-imacs-2017-2018/

Kaotik · Apr 19, 2016

Anarchist4000 said:
As expected until it releases. Pretty sure there were some FCC filings about new MS console parts the other day.

Which fit the far longer rumoured "Xbox One Slim"

yuri · Apr 19, 2016

Love_In_Rio said:
One interesting thing about this is AMD is going to shrink Puma die cores to 14nm. Is this the first news we have about this?.

In fact, not the first news.

AMD was working on Tiger (16nm FF) a Cheetah (14nm FF) projects at least in 2014 and 2015, respectively. The codenames fit to the AMD's feline cores line pretty good.

However, in the past there were many scrapped project in this particular family - for instance Lion (28nm BT?), Margay or Leopard.

Anarchist4000 · Apr 19, 2016

yuri said:
AMD was working on Tiger (16nm FF) a Cheetah (14nm FF) projects at least in 2014 and 2015, respectively.

I wonder if there's any relation to the scalar processors in GCN?

3dilettante · Apr 19, 2016

I don't think the scalar unit in GCN has much to do with AMD's Cat cores.

On that topic, some of the proposed changes for SIMD or wavefront handling in the patents might have implications for GCN as we know it. Items like the high-performance vector/scalar units could break code that has baked in wait states or additional vector operations for some of the aliased registers and sourcing between the scalar and vector domains.

A high-performance unit that breaks the 4-cycle cadence makes the vector path seem faster relative to the scalar path, and the variable-SIMD patent assumes a tighter latency for updating the front end with predication information. The former could cause code to misbehave unless the CU's domain-crossing penalties are reduced or properly interlocked. The latter would be more transparent since it speeds up forwarding of information.

Vector ops that source from a scalar register or from the LDS start to look different too, if the CU dynamically puts them on a higher issue rate. It wouldn't be so much for correctness as contention for shared resources if they remain shared at the same ratios and capabilities they have now.

Anarchist4000 · Apr 19, 2016

3dilettante said:
On that topic, some of the proposed changes for SIMD or wavefront handling in the patents might have implications for GCN as we know it. Items like the high-performance vector/scalar units could break code that has baked in wait states or additional vector operations for some of the aliased registers and sourcing between the scalar and vector domains.

Maybe, but that should still largely fall on the compiler to handle and detect. Just opt out of using it unless specifically called. The basic sync primitive they had seemed capable. That flexible scalar is a bit more robust than the wave level operations of SM6.0 I was expecting. The possibilities it opens likely make it worth reworking any compute code as the models would change dramatically and be easier to implement.

No reason you couldn't have a scalar for the CU managing all the scheduling. Dynamically creating waves and rearranging as necessary. Bonus points for an ARM core that could do all the CUs at once. Didn't they mention something like that as a cost saving measure for internal testing?

3dilettante · Apr 19, 2016

Anarchist4000 said:
Maybe, but that should still largely fall on the compiler to handle and detect. Just opt out of using it unless specifically called. The basic sync primitive they had seemed capable. That flexible scalar is a bit more robust than the wave level operations of SM6.0 I was expecting. The possibilities it opens likely make it worth reworking any compute code as the models would change dramatically and be easier to implement.

There are wait counts for the various memory accesses and exports, but NOP or independent instruction padding handles the class of operations that involve sourcing or forwarding between vector and scalar elements. It would be an unenviable bug to have to trace through for a problem that dynamically may or may not occur based on what SIMD the scheduler chooses. The alternative can be an excessively pessimistic compiler that inserts enough NOPs to satisfy the worst case where a wait state of several 4-cycle vector issues gets multiplied by a 4x scalar issue rate.

There's another race condition for flat addressing that is incrementally more serializing with the indirection through the LDS done with the wavefront reformation proposal.

Silent_Buddha · Apr 19, 2016

no-X said:
Raw GPU performance is less than 6 % higher than R9 380X, while the bandwidth is 20 % higher.

Can't compare bandwidth with desktop parts as desktops don't have to deal with memory contention like the PS4 has to. The GDDR5 on PS4 has to service both the CPU and GPU effective bandwidth for the PS4 will generally be lower than the number stated.

Regards,
SB

AMD: Speculation, Rumors, and Discussion (Archive)

fellix

Ailuros

Epsilon plus three

fellix

Ailuros

Epsilon plus three

DuckThor Evil

no-X

Deleted member 13524

Guest

Grall

Invisible Member

Razor1

Kaotik

Drunk Member

Love_In_Rio

Anarchist4000

Love_In_Rio

Kaotik

Drunk Member

yuri

Anarchist4000

3dilettante

Anarchist4000

3dilettante

Silent_Buddha

Similar threads