Predict: Next gen console tech (9th iteration and 10th iteration edition) [2014 - 2017]

3dilettante · Apr 27, 2017

iroboto said:
how much die space would be left for the GPU do you think if they went with 8 core zen?
that should give us an idea of GPU power should they pair like this.

The ~50mm2 range would be in keeping with the CPU footprint of Orbis. The ratio is likely much more skewed to the GPU with Neo.
The 25-30W range for the CPU may help determine what the GPU takes up, based on what Sony's power budget turns out to be.

One seemingly remote possibility from AMD's HPC plans is that the CPU and GPU won't be on the same chip anymore. Potentially, a significant chunk of the uncore and IO won't be on either the CPU or GPU.

HBRU said:
The Jaguar 4 core (complete) was about 26 mm2 at 28 nm... that means something -I guess- like 10 mm2 at 7 nm... so 20 cores are around 10*5=50 mm2 @7nm or 60 mm2 for 24 cores... I know zen cores are much better, but keeping compatibility maybe is more important (with the online sales taking much and much more importance would be nice to sell ps4 titles also on ps5). Another idea is that they use an etherogenous approach.... 8 old Jaguar cores + 8 new Zen cores.... I see this also on Switch.

Jaguar's scaling is uncertain, as is its viability at even deeper nodes.
It seems likelier based on some of Cerny's statements that there's going to be a generational divide where absolute hardware equivalence will not be enforced, and catering to an ancient Jaguar core won't be required.
The mixed-core setup with big and little ARM chips also requires equal ISA support, which is not true for Jaguar and Zen. Their cache hierarchy, protocols, and other system elements may also be incompatible.
The base GPU architecture may also reach a limit in how far it can be tweaked before a genuinely new basis is needed.

Possibly, less than perfect emulation of the hardware might be possible. Otherwise, the quirky way Sony lays out its system with the APU hooking indirectly into Sony's platform processor may allow for something like a fat PS3 solution.

Kaotik · Apr 27, 2017

HBRU said:
The Jaguar 4 core (complete) was about 26 mm2 at 28 nm... that means something -I guess- like 10 mm2 at 7 nm... so 20 cores are around 10*5=50 mm2 @7nm or 60 mm2 for 24 cores... I know zen cores are much better, but keeping compatibility maybe is more important (with the online sales taking much and much more importance would be nice to sell ps4 titles also on ps5). Another idea is that they use an etherogenous approach.... 8 old Jaguar cores + 8 new Zen cores.... I see this also on Switch.

AlNets said:
Switch doesn't activate the A53s.

While Switch doesn't activate those, gazillion different other SoCs do use mix'n'matched ARM-compatible CPUs

function · Apr 27, 2017

3dilettante said:
It seems likelier based on some of Cerny's statements that there's going to be a generational divide where absolute hardware equivalence will not be enforced, and catering to an ancient Jaguar core won't be required.
The mixed-core setup with big and little ARM chips also requires equal ISA support, which is not true for Jaguar and Zen. Their cache hierarchy, protocols, and other system elements may also be incompatible.
The base GPU architecture may also reach a limit in how far it can be tweaked before a genuinely new basis is needed.

Where there are ISA differences between processors, can you put up sort of a "virtual wall" for code so that it is unaware of another processor type, never accesses a cache it shouldn't, and circumvents the kind of issues you're talking about? Maybe a flag of some kind to say "don't read me if you're the other processor type - I don't exist".

I was thinking of code that can be designated as being able to access one processor type, or the other, or both. I don't know how it'd work, and it sounds like a bit of a nightmare, but could it be achievable?

HBRU · Apr 28, 2017

I'm not sure SMT is going to be so usefull into Console where threads can be optimized to not interfere with each others also without it... and also in PC I know most games on Ryzen runned better with SMT disabled. So at least this feature is not so usefull (as it seems)... so probably related silicon is a waste. In real console "life" is to be demonstrate the performance gains to adopt Ryzen instead of Jaguar. Then Ryzen "along with including the standard ISA, has a few new custom instructions that are AMD only."... so it seems the standard ISA is included into Ryzen so Jaguar code can be run on Ryzen but not the opposite (if some, few special instructions are used). Right ?!

The Jaguar core has support for the following instruction sets and instructions: MMX, SSE, SSE2, SSE3, SSSE3, SSE4a, SSE4.1, SSE4.2, AVX, F16C, CLMUL, AES, BMI1, MOVBE (Move Big-Endian instruction), XSAVE/XSAVEOPT, ABM (POPCNT/LZCNT), and AMD-V.[1]

https://en.wikipedia.org/wiki/Jaguar_(microarchitecture)#Instruction_set_support

Ryzen has.... some of the new commands are linked with ones that Intel already uses, such as RDSEED for random number generation, or SHA1/SHA256 for cryptography (even with the recent breakthrough in security). The two new instructions are CLZERO and PTE Coalescing. The first, CLZERO, is aimed to clear a cache line and is more aimed at the data center and HPC crowds. This allows a thread to clear a poisoned cache line atomically (in one cycle) in preparation for zero data structures. It also allows a level of repeatability when the cache line is filled with expected data. CLZERO support will be determined by a CPUID bit. PTE (Page Table Entry) Coalescing is the ability to combine small 4K page tables into 32K page tables, and is a software transparent implementation. This is useful for reducing the number of entries in the TLBs and the queues, but requires certain criteria of the data to be used within the branch predictor to be met.

http://www.anandtech.com/show/11170...review-a-deep-dive-on-1800x-1700x-and-1700/10

HBRU · Apr 28, 2017

Jaguar x86-64-bit cores have a physical address of 40 bits in width but Ryzen is 48 bits... as I discovered (but not sure)....

itsmydamnation · Apr 28, 2017

HBRU said:
I'm not sure SMT is going to be so usefull into Console where threads can be optimized to not interfere with each others also without it... and also in PC I know most games on Ryzen runned better with SMT disabled.

Thats not really accurate at all, SMT helps when you have low ILP. Part of ryzen's issue was games treating all threads as Cores (not being HT aware). Console Dev's could look to exploit the extra throughput of SMT in situations where ILP is low. Im sure there are many methods DEV's can come up with to control scheduling around SMT.

So at least this feature is not so usefull (as it seems)... so probably related silicon is a waste.

Hardly there are very few dedicated resources to SMT in Zen, most structures are shared.

In real console "life" is to be demonstrate the performance gains to adopt Ryzen instead of Jaguar. Then Ryzen "along with including the standard ISA, has a few new custom instructions that are AMD only."... so it seems the standard ISA is included into Ryzen so Jaguar code can be run on Ryzen but not the opposite (if some, few special instructions are used). Right ?!

There are almost no AMD only instructions in Zen, im aware of one instruction that is exposed and that is CLZERO, but really the instructions dev's would use that don't work with jaguar are AVX2 and FMA.

physical addressing doesn't matter, these all have MMU's and nowhere near enough memory to matter.

HBRU · Apr 28, 2017

Different sized physical addressing of the two CPU type may allow something a bit strange.... among the same unified memory GDDR6 -controlled by the same Northbridge- it will be possible to let some memory beeing visible ONLY by the Zen cores so that it contains just Zen-only-code and the rest of the memory to the "Zen-not-only-code" + "Jaguars-code" + "GPU-code" + data... I know it may sound a bit strange, maybe I'm crazy.... One Jaguar core is anyway always used by the OS, so the OS will not be able to control part of the memory... thats also strange I know... then 7 others Jaguar cores may help in collateral tasks where Jaguar works is fine (in PS5). I also think that a 40 bits RAM - APU (Northbridge) is less silicon consuming than a 48 bits one... so I'm not so sure it doesn't matter. Just crazy

.. So maybe a kind of very fast ESRAM visible only to ZENs cores to perform AVX2 and such.

sebbbi · Apr 29, 2017

HBRU said:
I'm not sure SMT is going to be so usefull into Console where threads can be optimized to not interfere with each others also without it... and also in PC I know most games on Ryzen runned better with SMT disabled. So at least this feature is not so usefull (as it seems)... so probably related silicon is a waste.

SMT was very important last gen. Both Xbox 360 and PS3 had it. 30%+ performance gains were common (vs only one thread per core). Modern OoO cores can keep their pipelines better fed from single thread of instructions, but modern OoO cores are also much wider than old dual-issue, in-order cores. The main reason why Ryzen SMT isn't useful in PC games yet, is because most games don't scale to 16 threads. There simply isn't enough work to be done to saturate that many cores/threads. Intel's i7 7700K (quad core) gets some gains from Hyperthreading compared to similarly clocked i5 model without it. Dual core i3 and Pentium models show significant gains from Hyperthreading in games. SMT is definitely a good technology for games, but it only starts helping when your game exceeds the parallelism available of the physical cores. As long as most AAA games are targeting current gen consoles (7x Jaguar cores), I don't expect to see big gains on PC from higher core counts than four. A 4 GHz dual core i3 is already fast enough to execute all those 7 jaguar threads sequentially. SMT is helpful for dual core CPUs, but there just isn't enough work in most games to saturate SMT at 4+ cores.

The importance of SMT of course also depends on the core count. If next gen has quad core Ryzen, then SMT is going to be extremely important. But if there's 8+ Ryzen cores, then SMT is going to help a lot less games. But SMT is almost free (die space), so removing it wouldn't help much. Nintendo Switch used a stock Tegra X1, even though more than 10% of the die space was used by components not needed in game console (camera related things). Customizing the cores by removing SMT would be expensive. Cost reduction in manufacturing would be tiny, and the design would less less future proof. Not worth it.

Jay · Apr 29, 2017

sebbbi said:
Customizing the cores by removing SMT would be expensive. Cost reduction in manufacturing would be tiny, and the design would less less future proof. Not worth it.

aren't amd releasing non smt ryzen chips?
then customizing the cores wouldn't be a factor, and amd would be selling the non smt design for cheaper than enabled ones

sebbbi · Apr 29, 2017

Jay said:
aren't amd releasing non smt ryzen chips?
then customizing the cores wouldn't be a factor, and amd would be selling the non smt design for cheaper than enabled ones

AMD is releasing some budget models with SMT disabled and some cores disabled. But those chips still have SMT hardware, it is simply broken or untested. SMT adds a tiny amount of transistors. I doubt many chips have SMT broken.

Intel disables SMT from all desktop i5 chips because they want to ask more money from desktop i7 chips. Professional users want SMT and are willing to pay extra for it. Enabling it in all i5 processors would make i7 obsolete. This is a pure business decision. Intel has plenty of i3 models and even Pentiums nowadays with SMT enabled. Almost all of their mobile CPUs also have SMT enabled. Different markets = different product segments.

Jay · Apr 29, 2017

sebbbi said:
AMD is releasing some budget models with SMT disabled and some cores disabled. But those chips still have SMT hardware, it is simply broken or untested. SMT adds a tiny amount of transistors. I doubt many chips have SMT broken.

Intel disables SMT from all desktop i5 chips because they want to ask more money from desktop i7 chips. Professional users want SMT and are willing to pay extra for it. Enabling it in all i5 processors would make i7 obsolete. This is a pure business decision. Intel has plenty of i3 models and even Pentiums nowadays with SMT enabled. Almost all of their mobile CPUs also have SMT enabled. Different markets = different product segments.

so when amd presents ryzen cores it would only sell them as smt enabled, it would be the client to decide if they want to build with it enabled or disabled?

i understand your point about it being purely a business decision, but that's amd selling the chips, not design to other companies to build like in the console space.

itsmydamnation · Apr 29, 2017

Jay said:
so when amd presents ryzen cores it would only sell them as smt enabled, it would be the client to decide if they want to build with it enabled or disabled?

i understand your point about it being purely a business decision, but that's amd selling the chips, not design to other companies to build like in the console space.

Console make can spend xx millions upfront + yy margin + 24 odd months to produce a CPU using already available components ( Zen CCX) . Or then can spend XXX millions upfront + yy margin + 48-60 months to produce a custom CPU that still might crash and burn.

You think MS and Sony wanted mid 1.6ghz clock jaguars? But given the options at the time it's what the could afford both in terms of time and cost,

Jay · Apr 29, 2017

itsmydamnation said:
Console make can spend xx millions upfront + yy margin + 24 odd months to produce a CPU using already available components ( Zen CCX) . Or then can spend XXX millions upfront + yy margin + 48-60 months to produce a custom CPU that still might crash and burn.

You think MS and Sony wanted mid 1.6ghz clock jaguars? But given the options at the time it's what the could afford both in terms of time and cost,

not my point/question.

my question is do amd sell it as smt enabled design or not, or if its only sold as smt enabled.

i do think ms and Sony wanted mid 1.6ghz 8 core jaguars, given their power, size and cost profiles, it did what they wanted.
it's us who would liked to have more.
given the overall specs of the systems i think it was adequate personally.

Michellstar · Apr 29, 2017

Jay said:
not my point/question.

my question is do amd sell it as smt enabled design or not, or if its only sold as smt enabled.
.

It doesn't matter, the transistors are there

Enviado desde mi Redmi Note 4 mediante Tapatalk

sebbbi · Apr 29, 2017

Jay said:
my question is do amd sell it as smt enabled design or not, or if its only sold as smt enabled.

SMT is a technical feature that is already implemented. It doesn't save any cost to disable it. If they sell the design with SMT disabled, the customer will get a part with lower perf and lower perf/watt. Obviously the customer is likely willing to pay less money for a chip that is partially broken/disabled. Also, AMD semi-custom division isn't selling chips directly to customers. They license IP, and the customer produces the chips using third party fabs of their choice. The customer makes the call whether they want to disable some features of the IP (to improve yields). I don't see a good reason why AMD would sell IP rights to somebody, but do not allow them to use all the IP features.

Consumer market is completely different matter.

Jay · Apr 29, 2017

sebbbi said:
SMT is a technical feature that is already implemented. It doesn't save any cost to disable it. If they sell the design with SMT disabled, the customer will get a part with lower perf and lower perf/watt. Obviously the customer is likely willing to pay less money for a chip that is partially broken/disabled. Also, AMD semi-custom division isn't selling chips directly to customers. They license IP, and the customer produces the chips using third party fabs of their choice. The customer makes the call whether they want to disable some features of the IP (to improve yields). I don't see a good reason why AMD would sell IP rights to somebody, but do not allow them to use all the IP features.

Consumer market is completely different matter.

thanks, this is what I was wondering.
if it was only disabled at fab, or if design could also easily take it into account when selling the ip, as it isn't amd producing chip it for consumers.

anyway, sounds very unlikely.

3dilettante · Apr 29, 2017

function said:
Where there are ISA differences between processors, can you put up sort of a "virtual wall" for code so that it is unaware of another processor type, never accesses a cache it shouldn't, and circumvents the kind of issues you're talking about? Maybe a flag of some kind to say "don't read me if you're the other processor type - I don't exist".

I was thinking of code that can be designated as being able to access one processor type, or the other, or both. I don't know how it'd work, and it sounds like a bit of a nightmare, but could it be achievable?

There is a range of options for handling architectural differences when running a virtualized system. The host can run full-on emulation, or if the processors in question are in the same lineage it can have partially native support with the hypervisor set to step in whenever an unrecognized instruction is encountered. Neither would be a practical solution if the desire is to run software at speed.

Managing coherence is something else, and while there are methods for tweaking coherence protocols so that a more complex one can coexist with another, it's mostly agnostic to the ISA or the architecture. The cache wouldn't be in the position to pick and choose.

This sounds more like a highly partitioned environment, rather than the design space for ARM with big and little cores potentially running the same software and possibly trading threads dynamically.

Separating the memory and ISAs more strongly has been done. For example, HSA has a form of shared memory and carefully isolated software and caching. IO-coherent secondary processors are found everywhere, and one example of independent host processors would be Sony's southbridge chip and the APU--both running independent memory spaces and operating systems and linked through an IO-coherent interface.

HBRU · May 2, 2017

So AVX2 and FMA is what is really lacking on Jaguar CPUs... Thank you. I guess with very fast and wide RAM BUS (maybe with HMB2) other problems are posssibly much much less relevant than this two. As I know AMD is working in a new hyper-fusion of VEGA + RYZEN + HMB2... dont know how they can fit (and keep fresh) such a huge number of transistors toghether... but will see... Maybe as they reach 7nm. Thats why Globalfoundry is pointing direct to 7 nm ?! Maybe. Would be possible otherwise for AMD to evolve Jaguar in the sense of integrating AVX2 and FMA into Jaguar and at the same time resize all to 7 nm ?! If both MS and Sony ask that is going maybe be possible. I still see a huge and expensive transistor budget VEGA + RYZEN + HMB2. One thingh is sure IMHO... before 2020 we will not see a new Console.

Prophecy2k · May 2, 2017

HBM2 is unlikely given how expensive and power hungry it is supposed to be.

HBM3 is supposed to be both cheaper and lower power. Hence if the next-gen console don't use GDDR6 then I would imagine HBM3 is more likely.

Cyan · May 2, 2017

Prophecy2k said:
HBM2 is unlikely given how expensive and power hungry it is supposed to be.

HBM3 is supposed to be both cheaper and lower power. Hence if the next-gen console don't use GDDR6 then I would imagine HBM3 is more likely.

Wasn't HBM2 supposed to be less power hungry than GDDR5 while providing more performance? Time ago it sounded like the next big thing to me.

Predict: Next gen console tech (9th iteration and 10th iteration edition) [2014 - 2017]

3dilettante

Kaotik

Drunk Member

function

None functional

HBRU

HBRU

itsmydamnation

HBRU

sebbbi

Jay

sebbbi

Jay

itsmydamnation

Jay

Michellstar

sebbbi

Jay

3dilettante

HBRU

Prophecy2k

Cyan

orange

Similar threads