Next Generation Hardware Speculation with a Technical Spin [post E3 2019, pre GDC 2020] [XBSX, PS5]

Status
Not open for further replies.
Your DDR4 will need a seperate bus which takes pins and power which could have been used for a wider GDDR6 bus and thus a faster system.

Cheers
Not sure that offsets the GDDR6 reserve and it also doesn't mean that you will actually get a wider bus without it.
 
In previous generations Sony have tried unique memory configurations for consoles:

PS1 EDO DRAM
PS2 Rambus DRAM and EDRAM
PSV wide-band VRAM
PS4 unified GDDR5 pool
PS4 Pro GDDR5 + DDR3

What else can Sony use?

IMO unified 16GB GDDR6 is not the most efficient and Sony tends to try special configurations.
For example

16GB GDDR6 for GPU
8GB DDR4 for CPU / OS & Apps

How about wide-band stacked RAM like PSVITA?
The ddr3 in ps4 and ps4 pro don't really count, it was a storage cache on the south bridge and it wasn't used by either the cpu nor the gpu. Both ps4 and pro were technically unified gddr5. It's been proven to be the most effective when dealing with a single SoC.

it's hard to imagine anything better than either unified gddr6 or unified HBM2e, and gddr6 wins so far because it's less expensive.

The ddr3 should be gone since there's super fast storage now, it won't need that buffer anymore. unless there's still a need for some ram buffer on the SB for other purposes, I can't imagine why.
 
In previous generations Sony have tried unique memory configurations for consoles:

PS1 EDO DRAM
PS2 Rambus DRAM and EDRAM
PSV wide-band VRAM
PS4 unified GDDR5 pool
PS4 Pro GDDR5 + DDR3

What else can Sony use?

IMO unified 16GB GDDR6 is not the most efficient and Sony tends to try special configurations.
For example

16GB GDDR6 for GPU
8GB DDR4 for CPU / OS & Apps

How about wide-band stacked RAM like PSVITA?
The vita method is impractical due to heating concerns. They mounted die face to face.


Use a mix of higher density chips and shove the OS reservation on the higher addresses? :p

*cough*

Clamshell 2x1GB * 7 + 2x2GB*1 = 18GB = 16GB + 2GB OS (256-bit bus) :runaway:
I think that’s what MS is doing with XSX. No need for clamshell though. 16Gb chips are readily available.
 
Fast nvme ssd can probably swap those apps into disk in a split second. No need to keep stuff in ram that is not being used.

That is reasonable for apps. But for an improvement in persistent destructible/changing sandboxes I would prefer more free memory than these days. I seriously doubt they would want such state changes be permanently writing to SSD.

Revisiting your kills from months ago in Division X rotting away on mountains could surely add to the whole atmosphere and be helpful to duck behind:)
 
Fast nvme ssd can probably swap those apps into disk in a split second. No need to keep stuff in ram that is not being used.

The ddr3 should be gone since there's super fast storage now, it won't need that buffer anymore. unless there's still a need for some ram buffer on the SB for other purposes, I can't imagine why.
We will need some GDDR6 for currently executing App right?
(Let’s say 500MB)

Does it mean the console should move some game data from GDDR6 RAM into SSD?
 
We will need some GDDR6 for currently executing App right?
(Let’s say 500MB)

Does it mean the console should move some game data from GDDR6 RAM into SSD?

yes. And that's very likely what microsoft is doing with that allow multiple games to be suspended and instant restore to where you left it feature.

It's all about smarter not bruter force.
 
Your DDR4 will need a seperate bus which takes pins and power which could have been used for a wider GDDR6 bus and thus a faster system.

Cheers
If assuming 16GB GDDR6@ 576GB/s and
8GB DDR4 for CPU and OS/apps. Now we change to a unified GDDR6 pool, what will this look like with similar BOM and power consumption?
 
If there is gddr6 and some other memory pool that likely comes with arm cpu. That would be interesting in many ways. Could some low power tasks be run while x86soc and gddr6 is powered down? Would that arm core handle ssd and networking? Could it even run some apps? No fans turning at all when running netflix on arm?
 
Once you already have an aditional chip to handle some OS functions and IO like ps4 had, and hopefully it can properly do background downloads on its own this time, you might aswell beef the guy up a bit so it can run the whole OS, and give it some healthy amount of low-cost memory and one can even partition a portion of it for game use. There is plenty stuff in a game that doesn't really need ultra-fast BW, but that is not worth tapping the SSD all the time for. That can give you a couple extra Gigs of useable memory for the running game for relatively low cost. If the extra chip acts as a southbridge, put the extra pins and bus there, so the game SOC pays not cost for the hybrid memory.
 
Use a mix of higher density chips and shove the OS reservation on the higher addresses? :p

*cough*

Clamshell 2x1GB * 7 + 2x2GB*1 = 18GB = 16GB + 2GB OS (256-bit bus) :runaway:

In Sony's SSD patent they have mentioned small on-board CPU for offloading storage work from main CPU.

They should beef SSD CPU up [heck they can use few Jaguar cores], enlarge the SSD cache with fiew gigs of low power DDR, and put the OS directly there. That will free up all 8c/16t for gaming. :cool::LOL:
 
In your opinion, might the strange heatsink patent be a means of achieving something similar to the Vita method?
Theoretically, yes, but I have a hard time believing they couldn’t achieve similar results with memory die on the same side in a 2.5D package. I think a dual sided cooling solution would be more about improving the overall thermal sinking from main APU. The heatsink is not the only path of thermal relief for a die (there is some heat spreading in the board), but it’s certainly the best path.

Putting memory on the other side has other advantages I think that may sway the balance. Your package design can become smaller (as compared to a same-side 2.5D), and you can probably get away without having an interposer and having the package take care of the HDI like in TSMC’s InFO-MS packaging.

The other advantage is that you can use a smaller, cheaper PWB. To route GDDR6 you need a low dielectric PWB with controlled copper roughness and you care about the directionality of the weave. If you can move main memory traffic off the PWB, you can probably get away with a standard FR4 board with less tightly controlled roughness and tolerances and save a few dollars per console that way.
 
Last edited:
John Carmack not a fan of dedicated chips for audio or physics in nextgen consoles


I think that this can be argued in many cases, since disparate resources increase development load and the typical methods for interfacing or chaining these resources tend to be restrictive.
I would say there are some "organizational" or systemic reasons why such elements may persist.
For example, there are benefits in terms of platform-level services and features in having commonly used capabilities on-hand with a small footprint in system resources and developer effort, particularly if dealing with smaller teams or staff roles that don't have coding or DSP management in their skillset.
There are also structural concerns in terms of what is most practical for the silicon elements of the architecture, and what shortcomings may present themselves in the more centralized resources. The PS4's more general resources were somewhat general with specifically problematic caveats.

More commercial concerns can drive unfortunate organizational requirements--DRM, licensing, patents, and platform security. In some ways, these resources can encapsulate various legal or publisher-derived concerns away from developers, and for the platform holder it is another level of control over who can access protected content or secured values in the system (DRM keys, user financial details, etc.).
The downside for some of those concerns which can go towards Carmack's point is that addressing these is probably at best neutral as far as coded solution, and some of the elements like security can constrain developers in terms of flexibility or latency.

Perhaps one of the questions for this generation is whether the tools available for straddling these domains in light of a heightened threat environment and commercial pressures have improved.

This is the video I had lost track of over the years concerning Sony's initial review of the audio capabilities of the GPU. I don't think there's been a followup on this topic since from them.
There are some interesting tidbits not in the slide deck, such as discussion of latency starting at 21:15 that show some of the range in tolerance for latency (most demanding: 5ms or less), and how hardware or API limitations could push latencies into unacceptable ranges. In that regard, some Carmack's objections are reinforced, with the counterpoint being that I fear some of their causes (DRM, security) will not be going away.

While there's some limited information on TrueAudio Next and processing time that might make it more acceptable than it was in 2013, without knowing if it's end-to-end latency for an audio task, and not knowing how flexible the audio pipeline can be, I'm unsure if the problem is yet solved. That's part of why the CPU can remain compelling, as it doesn't hop through various latency-adders and can more arbitrarily chain together tasks.

If security paranoia is even higher this time around, however, perhaps there will be barriers to that flexibility.
Less clear is whether there could be changes to the dedicated hardware paths to help them more efficiently tunnel around these barriers in a safe manner.


Why? SPUs should be ideal for audio - the number of lame gags and ill-informed descriptions considering them just being DSPs.
The Sony presentation did note there were deficiencies in the platform by making demands on various developers to code their own DSP solutions, and among other things they might do is use solutions with minimal decode complexity--which hurt memory footprint. There are also apparently licensing and other considerations that impacted the platform overall versus a competitor that had them as a given.

I mean, there are different levels of fancyness in how a game decides which samples to play and their properties (volume, pitch modulation, reverb, speed, etc) but the actual playback of the samples is what I think can easily be HW accelerated with very little wasted sillicon. I just don't think we need hundreds of voices. If a game wants to go thar far, then it can of course do some of the audio in software and feed that software mix to one of the HW channels.
There can be costs like variable latency in chaining different effects together, which is where flexibility or not having to dive through security layers can help despite hardware inefficiency.
A low latency way of combining resources or making some of them more programmable may be in order.

There have been AMD patents about latency-optimized hardware with more direct messaging with the host processor, but no indication on whether those would find use: http://www.freepatentsonline.com/y2018/0144435.html.
Although one thing it does have over some other patents is that the RTG chief architect's name is on it, so it may have rated more time and consideration than a defensive patent.

In previous generations Sony have tried unique memory configurations for consoles:

PS1 EDO DRAM
PS2 Rambus DRAM and EDRAM
PSV wide-band VRAM
PS4 unified GDDR5 pool
PS4 Pro GDDR5 + DDR3
Was EDO considered exotic? There were desktops that used it at the time.
Missing from that list is the PS3's RDRAM and GDDR3 split.
The PS4 also had DDR3 off the southbridge. The Pro increased it so that in part it allowed some of the OS reserve to be reduced by copying some data over to that larger buffer. From the point of view of the OS, it looks more like DRAM hanging off of an IO device, rather than part of main memory.

The ddr3 in ps4 and ps4 pro don't really count, it was a storage cache on the south bridge and it wasn't used by either the cpu nor the gpu. Both ps4 and pro were technically unified gddr5. It's been proven to be the most effective when dealing with a single SoC.
The DDR3 pool's primary reason for existing is to serve as RAM for the separate OS running on the southbridge.
Whether that particular platform quirk will continue on hasn't been discussed to my knowledge.

Theoretically, yes, but I have a hard time believing they couldn’t achieve similar results with memory die on the same side in a 2.5D package. I think a dual sided cooling solution would be more about improving the overall thermal sinking from main APU. The heatsink is not the only path of thermal relief for a die (there is some heat spreading in the board), but it’s certainly the best path.
Is this dual-sided cooling related to the patent where there's heatsink metal running up to the underside of the die? I feel like there needs to be some non-standard reason for not having enough space on the traditional side of the die for a good cooler, while the die is going to be challenged in terms of power delivery given how the footprint for existing ~100W+ die is typically maxed out with the IO and power/ground that would be creating the need for additional cooling in the first place.
 
Is this dual-sided cooling related to the patent where there's heatsink metal running up to the underside of the die? I feel like there needs to be some non-standard reason for not having enough space on the traditional side of the die for a good cooler, while the die is going to be challenged in terms of power delivery given how the footprint for existing ~100W+ die is typically maxed out with the IO and power/ground that would be creating the need for additional cooling in the first place.

The patent describes a variety of scenarios. One such situation is a multi-die package where components are embedded in, or placed on opposing sides of the package. That would create the need for heatsinks on both sides. Another application is one in which a radiating element is placed on the same side as the top of the PWB, and in this case you move the heatsink to the underside to avoid EMI. The third application is the one you posit, where a single die is cooled from both the top and the bottom.
 
The patent describes a variety of scenarios. One such situation is a multi-die package where components are embedded in, or placed on opposing sides of the package. That would create the need for heatsinks on both sides. Another application is one in which a radiating element is placed on the same side as the top of the PWB, and in this case you move the heatsink to the underside to avoid EMI. The third application is the one you posit, where a single die is cooled from both the top and the bottom.
It's one of their patents which includes a million "embodiments" and they put so many different methods that it was probably from an R&D team looking at different ways to improve cooling for multiple reasons (thickness, layout, efficiency, etc..) but without being for any particular product.

Using filled vias to conduct heat through a PCB has been done forever and it's been more than enough for anything in the range of portable electronics. The only thing from the patent that catch my eye is the tetris-like pieces going through the PCB and it leaves enough room for the pcb traces to go around. It could be used to increase cooling of some specific hot spots (are there difficult hot spots on each WGPs? on Zen cores?), it's a potential for increasing the overall cooling efficiency. the hard part is going through the package substrate too, but substrates have copper filled vias, and the underfill can possibly be some heat conducting epoxy. I don't know how these things are built.
 
Status
Not open for further replies.
Back
Top