Next Generation Hardware Speculation with a Technical Spin [2018]

anexanhume · Sep 3, 2018

msia2k75 said:
I think such configuration could be possible if they put a 80CU GPU inside (72 activated).
It would be big enough to fit a 384-bit bus on the PS5.
PS4 OG has a 20CU GPU (18 activated)
Pro doubles that -> 40CU (36 activated)
PS5 could double that again? 80CU (72 activated)
We will see...
One advantage could be that Sony will lowly clock the GPU (let's say around the same frequency of XBX's GPU) and still obtain some decent flops (~11TF)

I think the XB1X may have shifted the expectation here. To show you can get higher clocks, and thus, higher performance, at a fixed die size with a little more engineering effort in the cooling and power delivery, we might expect next gen consoles to have clocks closer to that of desktop GPUs as seen in the XB1X. At least, I could see Microsoft taking that approach. The question is whether they see the XB1X’s design as a true mass market design or too stringent that can’t be scaled effectively.

mrcorbo · Sep 3, 2018

iroboto said:
Hmm. The discussion in storage speed is interesting. If loading times can be reduced dramatically at the cost of graphics performance, the trade off may be worth it.

To give an example, if you’re heavy in MP games, most matches end in about 10 minutes. Then there is 1-2 minutes of matchmaking and another 1-3 minutes of loading depending on the game you’re playing. Over the course of the hour, hard drive loading speed can probably net you 1 additional round per hour. So if you’re getting 4 games per hour you might be able to get 5 Games now up to 7 if you are capable of pub stomping and bring matches down to 5 minutes.

Certain games where you die often and reloading commences will be much more bearable, Wolf:TNO was a terrible offender here, with its 3-4 minute load time after death.

For me, decreased loading times are the least interesting thing that would come from having guaranteed, fast, storage. The potential game-changer is that it would better support a tiered virtual memory system.

Think of an analogy where you have a work surface and then you store your materials in a storage closet across the room. You grab some materials and then start working. Whenever you are finished you have to either stop, remove the finished product from your workspace to make room for the next job and then go to the closet to grab the next set of materials, or you can section off some of your workspace and hire someone to constantly retrieve new materials and store them on your workspace (where you can easily access them) and remove completed or suspended work. This model, however, limits the space you can use to actually do work, since you are using some of your workspace for storage. Well, now we have the opportunity to add some shelves over and around your workspace. Now instead of moving things off your workspace into the closet across the room, you can put them on a shelf near you and fully store the materials you need for the next few jobs (or one really big one) within easy reach. Because of this you can use your whole workspace to do work and not have large stoppages whenever you need to remove completed work or get new materials. An assistant could stand there and retrieve materials from the shelf and hand them to you as needed and remove completed and suspended work as you go. Then a second assistant could be removing fully completed work from the shelves to the storage room and preloading the shelves with the materials for the next batch of jobs while still being able to grab stuff for you from (or deliver stuff for you to) the closet directly when needed. The tricky part here is are your assistants going to be smart enough to anticipate where the materials and products need to be for best efficiency on any job on their own initiative or does your supervisor (in this analogy representing the developer) need to pre-define a specific workflow for each set of jobs.

Lalaland · Sep 3, 2018

OK so here is where I look forward to being corrected but aren't tiered virtual memory systems a mitigation against inadequate or variable amounts of main memory?

It makes sense for Windows to provide a robust memory manager as any given application could have wildly different main memory requirements (e.g. a RAW image workflow versus quickly cropping a JPEG) on wildly different hardware with differing amounts of RAM. As a Windows application dev I don't want to worry about the many, many possible configurations for my image editor so I simply make load calls on the O/S and let it worry about what to purge or load as necessary. In a console though I have one fixed amount of RAM and a desire/need to custom manage my memory space directly by e.g. flushing Level_1 unique assets to load Level_2 unique assets while keeping the common assets in RAM. This level of specificity is not typically supported by O/S (afaik) rather the O/S relies on cruder age metrics and disk proximity to speculate on purge/loads rendering it not very useful to a console where I may be bounding across the open world for 20 minutes before I decide to pop open the map, an O/S memory algo may well have decided based on my just having passed a goat farm to purge the map image based on age and load goat_texture_6 in case I went over causing a memory miss on my map image forcing a painful pause and load instead.

Edit: As an example I've seen NVMe latencies as being quoted in the 2800ns (2.8us) range but GDDR5 is in the 10ns range, there is still an entire order of difference between the two so NVMe NAND is still too high latency to serve as a practical "warm storage" pool only Intel's XPoint is a viable non-volatile alternative to RAM and it's still quite a bit slower than SDRAM.

As I said much of this knowledge is half remembered from college and articles on what mad stuff folks had to do back in the day with painfully limited RAM budgets so I look forward to seeing the replies

Shifty Geezer · Sep 3, 2018

Lalaland said:
OK so here is where I look forward to being corrected but aren't tiered virtual memory systems a mitigation against inadequate or variable amounts of main memory?

Actually they're mitigation against inadequate register/L1 cache sizes.

If you could have all your data in storage as fast as local cache, that'd be the ideal. As that's not possible, each step trades immediacy for capacity.

If you use your numbers, at the moment there's 10ns latency with GDDR, and several ms for an HDD, so there's a performance chasm there of some 50,000x that flash sits nicely between.

Silent_Buddha · Sep 3, 2018

MrFox said:
384 would be great, but I'm wondering why there's never been a mid-range gpu with a 384 gddr bus. There must be a large inherent cost associated? Impact on yield?

There is a silicon cost associated with it due to the area it takes up on the edge of the chip for the memory interface. As the discrete GPU gets smaller, there is less edge space for the memory interface, hence it's only used on larger discrete GPUs.

That shouldn't be a problem for a console SOC which is generally significantly larger than a mid-range discrete GPU as seen with the SOC in the PS4/PS4-P/XBO/XBO-X (one of which actually uses a 384 bit bus) compared with traditional mid-range GPUs in the PC space. Even the PS4 SOC at ~348 mm^2 is larger than most chips that were generally used in discrete GPUs.

Regards,
SB

Lalaland · Sep 3, 2018

Shifty Geezer said:
Actually they're mitigation against inadequate register/L1 cache sizes.

Fair point

Oh yeah you're basically increasing latency by a factor of 1000 with each step but each of those steps is pretty damn huge and impractical to use as an intermediate stage that includes every tier. In other words I could see benefit in pulling a complete game onto the NVMe but there's little benefit in pulling 20% of it (as you would with a true tiered solution) and even when you pull 100% of the game onto the NVMe it's still far too slow to rely on a generic O/S algo to manage stores and loads. In other words I don't think a virtual memory scheme that allows devs to pretend they have more RAM in the box than they do would offer much benefit to console users, I mean even today the first thing to ensure better perf from your PC is to ensure you get as little memory trashing as possible whether that's between your disk and RAM or your GPU and RAM.

To come back to an earlier question I had in a post even if Sony/MS decide to go with a hybrid setup will they need to offer specific API calls to allow the game maker to specify which bits of the game to pull onto NVMe first? Or will we just deal with crappy first load times for the first 5-10 mins of a game until all of the game is on the NVMe?

milk · Sep 3, 2018

Lalaland said:
Fair point

Oh yeah you're basically increasing latency by a factor of 1000 with each step but each of those steps is pretty damn huge and impractical to use as an intermediate stage that includes every tier. In other words I could see benefit in pulling a complete game onto the NVMe but there's little benefit in pulling 20% of it (as you would with a true tiered solution) and even when you pull 100% of the game onto the NVMe it's still far too slow to rely on a generic O/S algo to manage stores and loads. In other words I don't think a virtual memory scheme that allows devs to pretend they have more RAM in the box than they do would offer much benefit to console users, I mean even today the first thing to ensure better perf from your PC is to ensure you get as little memory trashing as possible whether that's between your disk and RAM or your GPU and RAM.

To come back to an earlier question I had in a post even if Sony/MS decide to go with a hybrid setup will they need to offer specific API calls to allow the game maker to specify which bits of the game to pull onto NVMe first? Or will we just deal with crappy first load times for the first 5-10 mins of a game until all of the game is on the NVMe?

The right way to do this is to let devs handle this explicitly, and give out libraries that sort of handle it suboptimally for those who don't have time to do it on their own. Small devs will be using Unity or Unreal anyway, and those might probably provide some sort of flag system to control what has to stay at least in certain part of the memory hierarchy and when.
Remember that most modern games already have that type of sheme in place. Even linear games make extensive use of streaming, for everything from audio to level Geo, textures and logic. A lot of the fast 8Gb of ram on the PS4 is being used to cache streamed data that might only possibly be needed in the next 5 seconds but better be loaded in there already because data is loaded in large chunks from the HD drive to make the most of disk's seek times....

iroboto · Sep 3, 2018

mrcorbo said:
For me, decreased loading times are the least interesting thing that would come from having guaranteed, fast, storage.

Yea from a programming perspective faster storage will definitely open some possibilities to changing the pipeline. But the ultimate result of that would likely end in better performance/graphics of some format.
I was trying more or less to define generational differences. This generation added standby. There’s better graphics and better online play, but the biggest features added not iterated this gen to me range from sharing content to standby mode.

When I think about how our society has less time to sit down for longer gaming sessions, standby was likely a huge bonus for players who had 20-30 minutes windows to play, and with standby they didn’t lose 6 minutes or so waiting for loading.

On that thought however, live games, don’t work with standby. Each time you log off, you’re booted, and are required to reload everything. I’m looking at the biggest games, from Fortnite to Destiny and other titles that match this and the need for fast load times has never been greater. If you’re looking to complete an activity in a small window of time, live games aren’t going to work well.

Loading times are likely among the least interesting of features, but for gamers looking to get in more games or more game time, it’s is perhaps the biggest feature yet.

Imagine this: I’ve played 500 matches of PVP on destiny. Quite small relatively speaking. Each match is 10-12 minutes. Load times is about 2-3 min, match making 1-2 minutes.

Means I’ve spent about 500*4 minutes waiting doing nothing. 2000 minutes... or 33 hours of my life watching this stupid space ship fly in a loading screen. I could finish some single player adventures like 3 AAA titles in that time frame.

mrcorbo · Sep 3, 2018

Lalaland said:
Fair point

Oh yeah you're basically increasing latency by a factor of 1000 with each step but each of those steps is pretty damn huge and impractical to use as an intermediate stage that includes every tier. In other words I could see benefit in pulling a complete game onto the NVMe but there's little benefit in pulling 20% of it (as you would with a true tiered solution) and even when you pull 100% of the game onto the NVMe it's still far too slow to rely on a generic O/S algo to manage stores and loads. In other words I don't think a virtual memory scheme that allows devs to pretend they have more RAM in the box than they do would offer much benefit to console users, I mean even today the first thing to ensure better perf from your PC is to ensure you get as little memory trashing as possible whether that's between your disk and RAM or your GPU and RAM.

AMD have a solution for this exact scenario with an initial implementation that is active in Vega. The tricky part, as I said, is making it work well without the developer having to direct it. They should always have the option, of course, and preferably there are gradations of control so you can choose fully automatic, QoS tagging or direct control or combinations of any and all of those.

Silent_Buddha · Sep 3, 2018

mrcorbo said:
For me, decreased loading times are the least interesting thing that would come from having guaranteed, fast, storage. The potential game-changer is that it would better support a tiered virtual memory system.

Not just loading times, but time to save games as well. Although it may not be entirely due to the HDD in the PS4 (and Pro), it is interesting to note that while on PS4 it takes Yakuza 0 multiple seconds to save a game (and why does it need to save the game twice?), it happens almost instantly on an SSD in Steam (with cloud saves enabled).

Or to put it another way, the storage subsystem on consoles needs significant work, IMO, in addition to getting faster storage systems.

Regards,
SB

Lalaland · Sep 3, 2018

I've noticed this with Yakuza 0 but I think it's actually a consequence of a Sony default save system or at least I've been seeing that same awful interface with it's multiple yes/no prompts since the PS3 days from multiple Japanese devs not just Sega.

Shortbread · Sep 4, 2018

Rumor...

Expect an 8C/16T processor in the form of a Ryzen 7 2700-esque offering in the new PS5. Separate GPU and CPU? I've been told to expect a discrete GPU, but that would mean the PS5 has some beast-like hardware... even on PC standards as we'll only be getting 7nm GPUs with Navi.

Let's say, for the sake of argument, that Tweaktown has some actual insider knowledge of a discrete GPU & CPU setup, rather than an APU design for PS5. Should we expect higher frequencies/clocks from a discrete design? I would think not being an APU; hitting PC-like clocks around 3.2GHz (50-60w TDP), with a next-gen Ryzen CPU should be easier. And GPU clocks hitting around 1300-1500MHz, would be nice as well. Assuming, Sony has designed a better heat-pipe (or vapor-chamber) heatsink design for PS5.

anexanhume · Sep 4, 2018

Shortbread said:
Rumor...

Let's say, for the sake of argument, that Tweaktown has some actual insider knowledge of a discrete GPU & CPU setup, rather than an APU design for PS5. Should we expect higher frequencies/clocks from a discrete design? I would think not being an APU; hitting PC-like clocks around 3.2GHz (50-60w TDP), with a next-gen Ryzen CPU should be easier. And GPU clocks hitting around 1300-1500MHz, would be nice as well. Assuming, Sony has designed a better heat-pipe (or vapor-chamber) heatsink design for PS5.

My thinking is more discrete parts means they’re scared of big die yield on 7nm. That makes me think 2019 launch rather than 2020. Large grain of salt added, of course.

McHuj · Sep 4, 2018

Maybe both the CPU and GPU as separate dies on an interposer?

msia2k75 · Sep 4, 2018

McHuj said:
Maybe both the CPU and GPU as separate dies on an interposer?

MCM... They won't go back to pre-PS4 design.

itsmydamnation · Sep 4, 2018

msia2k75 said:
MCM... They won't go back to pre-PS4 design.

If they go full 16x pci-e 4 and there are further extensions to HBCC it could still work quite well. Remember Vega 20 is supposed to have GMI links and we might even see a MCM of it with a CPU SOC so i image AMD are well advanced on how they plan to handle coherency/data transfer.
Could also make seeing a small HBM stack direct to the GPU as an option with then slower memory on the CPU side, but i think that will come down to packaging techniques being ready etc.

Shifty Geezer · Sep 4, 2018

Has there ever been proven any meaningful benefit to close CPU<>GPU communication? Is the direct bus of the APU making a difference?

DieH@rd · Sep 4, 2018

Tweaktown reports of the AMD's 7nm plans, with some fresh [first concrete?] PS5 insider info
https://www.tweaktown.com/news/6303...avi-gpu-architecture-7nm-portfolio/index.html

"Navi on 7nm should be unveiled somewhere in 2H 2019, where I'm sure we'll see the bigger picture of Navi closer to Computex 2019 and even beyond. It will surely have to be as close as humanly possible to the unveiling of the PS5... because as soon as we know details about Navi, we'll be able to piece together how good the PS5 will be. Expect an 8C/16T processor in the form of a Ryzen 7 2700-esque offering in the new PS5. Separate GPU and CPU? I've been told to expect a discrete GPU, but that would mean the PS5 has some beast-like hardware... even on PC standards as we'll only be getting 7nm GPUs with Navi."

Initial batch done with interposer with redesigns down the line with APU? Or forever stuck with separate chips on mobo?

Shifty Geezer · Sep 4, 2018

DieH@rd said:
Initial batch done with interposer with redesigns down the line with APU? Or forever stuck with separate chips on mobo?

Depends on where lithography goes. Sony may be looking at a very limited set of cost-saving die-shrink options (same as everyone else) and to ensure decent yields of large parts, is deciding to keep the dies split. If combining them down the line becomes a sensible option, they'll do so. Otherwise, they won't.

And the persistently high price of PS3 and PS4 suggests they are happy to keep the entry level price higher over the full life of the platform.

Edit: Actually, that's not really what you were asking.

Are they going to gamble on interposer with a view towards APU, or do two dies forever (MCM long term option). I'd think the second. Given how cost scaling isn't happening these days, with the latest nodes costing as much or more as the larger nodes, the expectation would be that an economical single-chip solution won't be realistic within 5 years of launch. If you're going to go max performance, and so need two larger dies, find the cheapest way to do that. More silicon up front would be with a view to a more powerful launch unit and a longer life, IMHO.

Xbat · Sep 4, 2018

Discrete GPU would be a surprise but kind of makes sense if you consider yields should be better with two smaller chips than one large one?

20TFlops here we come

Next Generation Hardware Speculation with a Technical Spin [2018]

anexanhume

mrcorbo

Foo Fighter

Lalaland

Shifty Geezer

uber-Troll!

Silent_Buddha

Lalaland

milk

Like Verified

iroboto

Daft Funk

mrcorbo

Foo Fighter

Silent_Buddha

Lalaland

Shortbread

Island Hopper

anexanhume

McHuj

msia2k75

itsmydamnation

Shifty Geezer

uber-Troll!

DieH@rd

Shifty Geezer

uber-Troll!

Xbat

Similar threads