Next Generation Hardware Speculation with a Technical Spin [pre E3 2019]

Shifty Geezer · May 14, 2019

AlBran said:
I prefer Intel's "uncore" term.

Plus one.

MrFox · May 14, 2019

https://www.google.com/amp/s/seekin...-plus-19-percent-samsung-settlement-sony-deal

Immersion also announces that Sony Interactive Entertainment will license IMMR's advanced haptics patent portfolio for Sony's gaming and VR controller.

They don't do much outside of exploiting their generic patents for rumble motors. But they have patents for dual axis force feedback.

Pretty sure that means the DS5 will use a pair of Alps Haptic Reactor (and probably the next Move controller iteration will have one).

bbot · May 15, 2019

Back to my original idea of Anaconda using a custom Navi20 : Some were saying using all 64 cus would be expensive. I considered how much the clock speed would have to be boosted to get to 12tf if four cus were disabled (60 cus). Turns out the clock speed would have to be boosted from 1475mhz ro 1575mhz. The problem is, according to Adoredtv, Navi has power and thermal issues. Would this be reasonable?

bbot · May 15, 2019

Another possibility is to use Navi gpu chiplets, each with 36 cus, with 4 redundant. Lockhart would use one chiplet and Anaconda would use two.

Shortbread · May 15, 2019

bbot said:
Another possibility is to use Navi gpu chiplets, each with 36 cus, with 4 redundant. Lockhart would use one chiplet and Anaconda would use two.

Although I have mentioned such a design, I still have my worries with a multi-GPU chiplet console. Mostly stemming from how multi-GPU scaling (i.e., crossfire, NVSLI, etc.) in PCs aren't always effective at handling split-rendering or AFR workloads. And the high possibility of Navi still being tied to the GCN microarchitecture gives me even more pause on how effective will a GPU chiplet design will be within the console space?
* What type of thermals/watts are we talking about within the console space?
* Will there be a 100% perfect scaling between the chiplets?
* Will a chiplet go unused if developers find graphic scaling ineffective (i.e., micro-stuttering, AFR tearing from synchronization hiccups, AFR graphic banding issues, etc.), or it breaks their engine graphic pipeline completely (see: modern day unreal engine games) no matter how they implement multi-GPU scaling?
* What can Sony or Microsoft provide to developers on making multi-GPU scaling more effective in the console space, especially when Nvidia, AMD and Microsoft for decades haven't provided a 100% effective solution within the PC gaming space?

I'm not totally against the novelty of seeing a multiple GPU chiplet design within a console (if multi-GPU scaling issues weren't a factor), however, the expectations of such a design can cause more headaches than good. Maybe a more monolithic APU that can support more GPU CUs or a discrete GPU/CPU design allowing for a fully fleshed GPU and higher-clocks. IMHO, a monolithic APU or a more discrete design will make developers lives more easier towards games designs, and this coming from a guy who owns nothing but multi-GPU PC setups.

Jay · May 15, 2019

Shortbread said:
* What can Sony or Microsoft provide to developers on making multi-GPU scaling more effective in the console space, especially when Nvidia, AMD and Microsoft for decades haven't provided a 100% effective solution within the PC gaming space?

Think you've answered it here.
A console, not a pc.
A closed box with set components to target.
Everything is more stable and set from hardware to drivers.
Thats huge in regards to this from the get go.

Also the actual architectural implementation would be different. Wouldn't have multiple memory pools, etc

Metal_Spirit · May 15, 2019

bbot said:
Another possibility is to use Navi gpu chiplets, each with 36 cus, with 4 redundant. Lockhart would use one chiplet and Anaconda would use two.

Advancing my ideas for an article about this that I wrote for my webpage.

Double chiplets may have advantages:

1 - Lockhart would use one, Anaconda two. This would increase orders and might drop price.

2 - Even Dante chiplets could be used if damaged.

3 - No more 64 CU limit!

But it also has disadvantages

1 - 8 non working CUs per chip (double cost on dead area). Double memory controllers, double buses, double infinity fabric, double everything. Basicaly you would have Two GPUs, with the cost of two full GPUs, and a power consumption of also two full GPUs.I guess this would kill the first Advantage.

2 - Dante Chiplets are bigger. Adaptation would require a motherboard revision and a different cooling system.

3 - As stated, power consumption would be expected to be bigger than having a single GPU with double the CUs. Also, according to AMD, you would not have the same performance as a single GPU. That is why AMD is not using this solution.

Although consoles are the perfect enviroment to these solutions, this solution would not scale well on a lot of engines, and would require extra work. This would be eSRAM all over again! Programmers do not want exotic solutions, they want something simple to use.

In other words... I have serious doubts about that solution beeing viable.

Shortbread · May 15, 2019

Jay said:
Think you've answered it here.
A console, not a pc.
A closed box with set components to target.
Everything is more stable and set from hardware to drivers.
Thats huge in regards to this from the get go.

Also the actual architectural implementation would be different. Wouldn't have multiple memory pools, etc

Not really. Even a multiple GPU chiplet design would still succumb to multi-path rendering scaling issues, even in a closed-box system (console). These issues don’t magically disappear, especially when synchronization, scheduling, frame-interpolation, MCFI, and all manners of post-processing effects can be adversely affected by the slightest hiccups such as one chiplet being more sensitive than the other to temp or voltage changes (even in an APU design) which can throttle the whole pipeline/render leading to more screen tearing, major framerate drops, and banding which looks like an interlacing-resolution is being used. Even if temps and voltage changes weren’t a concern, rendering stalls (i.e., micro-stutters, etc.) are pretty much a given when two chips are trying to schedule/coordinate/interpolate/render between each other. I don’t think the headaches of such a design simply aren’t going away, because of a closed-boxed design.

function · May 15, 2019

I think MS would want a solution that required no additional developer overhead. There's always going to be consideration about allocating work when some of your execution units and memory come with additional penalties.

liem107 · May 15, 2019

Shortbread said:
Not really. Even a multiple GPU chiplet design would still succumb to multi-path rendering scaling issues, even in a closed-box system (console). These issues don’t magically disappear, especially when synchronization, scheduling, frame-interpolation, MCFI, and all manners of post-processing effects can be adversely effected by the slightest hiccups such as one chiplet being more sensitive than the other to temp or voltage changes (even in an APU design) which can throttle the whole pipeline/render leading to more screen tearing, major framerate drops, and banding which looks like an interlacing-resolution is being used. Even if temps and voltage changes weren’t a concern, rendering stalls (i.e., micro-stutters, etc.) are pretty much a given when two chips are trying to schedule/coordinate/interpolate/render between each other. I don’t think the headaches of such a design simply aren’t going away, because of a closed-boxed design.

Wouldn't the chiplet design resolve the multi GPU rendering problem instead?
I mean.. Can the IO chip virtualise the GPU so that 2 gpu chiplet appear as 1 ? All the schedulding logic (and unified L2/L3cache) would be in the IO chip and the chiplet would just have the CU array and small L1 cache, same for the cpu chiplet with L1 cache...
In fact I wonder if the chiplet design would not also help to realise the full HSA vision with a fully unified memory pool.
One IO controller overmind chip to rule them all (cpu and gpu)..
Is it not a practical solution ?

AlNom · May 15, 2019

All loads lead to Rome? (even those descended from the heavens). /flees

liem107 said:
Is it not a practical solution ?

It does sound a bit like decoupling the front-end of the GPU, and the upshot is that GPUs care less about latency than CPUs. That would have some interesting implications for PCs, although I think the elephant in the room is dealing with the bandwidth required between hypothetical GPU chiplets (i.e. interconnects). I could be mistaken, but AMD doesn't seem to have as good of an analogue to Intel's EMIB & without resorting to TSVs.

On the other hand, maybe the GPU L2 & ROPs would be on such an I/O die?

hm.

Shortbread · May 15, 2019

liem107 said:
Wouldn't the chiplet design resolve the multi GPU rendering problem instead?
I mean.. Can the IO chip virtualise the GPU so that 2 gpu chiplet appear as 1 ? All the schedulding logic (and unified L2/L3cache) would be in the IO chip and the chiplet would just have the CU array and small L1 cache, same for the cpu chiplet with L1 cache...
In fact I wonder if the chiplet design would not also help to realise the full HSA vision with a fully unified memory pool.
One IO controller overmind chip to rule them all (cpu and gpu)..
Is it not a practical solution ?

Even multi-million dollar render-farms with specialized I/O chips or boards, coordinating between hundreds of GPUs still exhibit these issues. That’s why multipass rendering is such a thing within film and CGI/Offline workloads. Not just for visual improvement (i.e., clarity, additional detail, etc.), but more so towards rendering any missed data or frames which can be lost during multi-GPU workloads/setups. And these task (schedule/coordinate/interpolate/render) become more complex within a real-time environment (videogames), which has no multipass equivalent on assisting against such stalls/errors.

And don’t get me wrong, I love AMD just like the next person, but seriously, if NVidia hasn’t resolved any of these issues that comes along with multi-GPU designs and scaling, I can’t picture AMD resolving them within an APU design housing multiple GPU chiplets.

iroboto · May 15, 2019

Shortbread said:
Even multi-million dollar render-farms with specialized I/O chips or boards coordinating between hundreds of GPUs still exhibit these issues. That’s why multipass rendering is such a thing within film and CGI/Offline workloads. Not just for visual improvement (i.e., clarity, additional detail, etc.), but more so rendering any missed data or frames which can [does] occur in multi-GPU setups. And these task (schedule/coordinate/interpolate/render) become more complex within a real-time environment (videogames), which has no multipass equivalent on assisting against such stalls/errors.

And don’t get me wrong, I love AMD just like the next person, but seriously, if NVidia hasn’t resolved any of these issues that comes along with multi-GPU designs and scaling, I can’t picture AMD resolving them within an APU design housing multiple GPU chiplets.

render farms as I understand it, still have a lot of different challenges when it comes to texture sizes etc. GPUs need to work off their own memory pools at the moment, and the textures get so large they will max out memory quite quickly. There's a slew of challenges that render farms have to tackle that real-time don't and vice versa. That being said, it's why we still see CPU rendering being part of the equation in that industry.

We're at a point where the APis support mGPU much better with more freedom and creativity, we need a baseline like a console to move it forward properly in engines.

Jay · May 15, 2019

Metal_Spirit said:
3 - As stated, power consumption would be expected to be bigger than having a single GPU with double the CUs. Also, according to AMD, you would not have the same performance as a single GPU. That is why AMD is not using this solution.

First off all thanks, kind of feedback I was hoping for.

AMD said that they haven't used chiplets with gpu's is due to the way they are exposed to the developer. They also said this isn't an issue for servers. So I would expect it in that space.

Metal_Spirit said:
1 - 8 non working CUs per chip (double cost on dead area). Double memory controllers, double buses, double infinity fabric, double everything. Basicaly you would have Two GPUs, with the cost of two full GPUs, and a power consumption of also two full GPUs.I guess this would kill the first Advantage.

Not sure I understand double cost on dead area.
Why would the memory controller be on the chiplets? I would expect the IO and memory controllers to be on single die?

I'm also unsure if it would be possible to move some of the front end from chiplets also.

Having 64(approx) cu's then having to clock it high must generate a lot of heat compared to chiplets that add up to much more cu's and can be clocked lower, less heat not sure how power usage would compare.

What is Dante?
I would go for of the shelf zen chiplets, custom gpu chiplets.

Shortbread · May 15, 2019

iroboto said:
render farms as I understand it, still have a lot of different challenges when it comes to texture sizes etc. GPUs need to work off their own memory pools at the moment, and the textures get so large they will max out memory quite quickly. There's a slew of challenges that render farms have to tackle that real-time don't and vice versa. That being said, it's why we still see CPU rendering being part of the equation in that industry.

We're at a point where the APis support mGPU much better with more freedom and creativity, we need a baseline like a console to move it forward properly in engines.

I get all this. And as I stated before, the novelty of an APU housing multiple GPU chiplets sounds interesting, however, it doesn't make any sense when there are valid issues against such a design. If Sony or Microsoft (along with AMD) have resolved these issues and show the thermal/wattage advantages over a discrete GPU/CPU design, then more power to them. But I don't see it happening.

iroboto · May 15, 2019

Shortbread said:
I get all this. And as I stated before, the novelty of an APU housing multiple GPU chiplets sounds interesting, however, it doesn't make any sense when there are valid issues against such a design. If Sony or Microsoft (along with AMD) have resolved these issues and show the thermal/wattage advantages over a discrete GPU/CPU design, then more power to them. But I don't see it happening.

yea, I don't see it either

I'm not big on the chiplet idea. Assembly and chip costs too high for a product too dirt cheap.

AlNom · May 15, 2019

Just you wait, an I/O die fabbed on GF (IBM) 14nm SOI with megawads of eDRAM. :V

I keed.

Jay · May 15, 2019

function said:
I think MS would want a solution that required no additional developer overhead. There's always going to be consideration about allocating work when some of your execution units and memory come with additional penalties.

Depends on the amount of work and difficulties in implementing something.
Otherwise we may as well forget about any progress at all as it all requires some level of developer input. Be interesting to hear as I suspect changes in engine design over the last 10 years would lend itself to it better.

Regarding synchronisation, etc. Engines already are having to deal with asynchronous compute.
I'm not expecting it to necessarily do alternative frame rendering.

DX12 mgpu looks like a lot better, compared to what was being done prior.

Shortbread · May 15, 2019

Anyone want to do a rundown on the potential pros and cons (i.e., clocks, TDP, size, cost, etc.) of a monolithic APU housing an 8c Ryzen CPU and 64CU GPU, as well as the pros/cons of a discrete CPU/GPU design?

Jay · May 15, 2019

iroboto said:
yea, I don't see it either

I'm not big on the chiplet idea. Assembly and chip costs too high for a product too dirt cheap.

These are also some of the reasons I dismissed it a while ago.

Given the amount of sku's, performance diffraction, it already in use for CPU's, its made me question it as a possibility.

Next Generation Hardware Speculation with a Technical Spin [pre E3 2019]

Shifty Geezer

uber-Troll!

MrFox

Deludedly Fantastic

bbot

bbot

Shortbread

Island Hopper

Jay

Metal_Spirit

Shortbread

Island Hopper

function

None functional

liem107

AlNom

Moderator

Shortbread

Island Hopper

iroboto

Daft Funk

Jay

Shortbread

Island Hopper

iroboto

Daft Funk

AlNom

Moderator

Jay

Shortbread

Island Hopper

Jay

Similar threads