Next Generation Hardware Speculation with a Technical Spin [post E3 2019, pre GDC 2020] [XBSX, PS5]

Status
Not open for further replies.
Do we have any evidence that current-gen consoles actually use the arm cores for low power or background tasks?

On PS4, they only serve to monitor the PSN and when needed wake up the system. All background downloads and installs are done when APU, RAM and HDD are waken up.

ARM CPU is placed in/near southbridge, which enables it to keep Ethernet/WiFi active while APU/RAM/HDD are in hibernation.
 
The Playstation 4 family's memory structure is a bit of an odd one, considering games only have access to 4.5 GB GDDR5 on Ps4 and 5GB GDDR5 on the 4Pro. I guess they could expand the PS5 alternate DDR4 memory pool enough to fit the entire OS there instead of only using it as a cache and swap scratch-pad.
Respectively 5GB and 5.5GB
 
Respectively 5GB and 5.5GB

No. It's as I said, 4.5 GB Ps4 and 5 GB 4Pro for GDDR5 memory.

---------------------------------------------------------------
Getting Started > Programming Basics > Programming Startup Guide > 7. Resources That Can be Used by Applications > Memory

Direct Memory
Direct memory is guaranteed to be physically continuous, and it is explicitly allocated/released/mapped by applications.

In Base mode, the size of direct memory that can be used by an application is 4.5 GiB (4608 MiB).

In NEO mode, the size of direct memory that can be used by an application is 5.0 GiB (5120 MiB).

Flexible Memory
Flexible memory is virtual memory. However, it is automatically locked by the kernel upon allocation and a page-out will not occur while the application is running.

In Base mode, the size of flexible memory allocated to an application is 512 MiB.

In NEO mode, the size of flexible memory allocated to the application is 512 MiB, the same as for Base Mode.

An application cannot secure all of the flexible memory that is allocated for each operation mode. Note that the following memory will be allocated from the flexible memory in advance before the main() function is called.
  • Footprint of the boot program (eboot.bin)
  • Memory allocated for heap areas of the C/C++ standard libraries (arbitrary sizes can be specified)
  • Buffer cache for the file system exclusive to the application (64 MiB)
Note
Other memory resources used by SDK libraries will also be subtracted from the resource amounts that can be used by applications.
 
That 512 is from gddr5 and is available to games, it's only reclaimed when the game is suspended in background. I thought Cerny explained this?

If they increase this next gen, they could free up more of the memory for apps switching. The amount they can swap in and out right now is limited to the anemic single ddr3 chip on the SB. With a 4GB/s SSD it could free up maybe 4GB with a maximum of 1 sec delay for app switch.

16GB of gddr6 could be:
6GB OS and apps running. (2GB fixed and 4GB indirect)
14GB games (10GB direct alloc and 4GB indirect)

That's a much more efficient use of the physical mem available than only allowing direct alloc in games and apps, the same amount without going through the OS allocator would require 20GB physical.
 
Last edited:
No. It's as I said, 4.5 GB Ps4 and 5 GB 4Pro for GDDR5 memory.

---------------------------------------------------------------
Getting Started > Programming Basics > Programming Startup Guide > 7. Resources That Can be Used by Applications > Memory

Direct Memory
Direct memory is guaranteed to be physically continuous, and it is explicitly allocated/released/mapped by applications.

In Base mode, the size of direct memory that can be used by an application is 4.5 GiB (4608 MiB).

In NEO mode, the size of direct memory that can be used by an application is 5.0 GiB (5120 MiB).

Flexible Memory
Flexible memory is virtual memory. However, it is automatically locked by the kernel upon allocation and a page-out will not occur while the application is running.

In Base mode, the size of flexible memory allocated to an application is 512 MiB.

In NEO mode, the size of flexible memory allocated to the application is 512 MiB, the same as for Base Mode.

An application cannot secure all of the flexible memory that is allocated for each operation mode. Note that the following memory will be allocated from the flexible memory in advance before the main() function is called.
  • Footprint of the boot program (eboot.bin)
  • Memory allocated for heap areas of the C/C++ standard libraries (arbitrary sizes can be specified)
  • Buffer cache for the file system exclusive to the application (64 MiB)
Note
Other memory resources used by SDK libraries will also be subtracted from the resource amounts that can be used by applications.
Whatever how you read (interpret) the part about the flexible memory and how the system allocates it. Most of it (minus the 64MB and such) is available for the games during normal execution. 4608 MB maximum for games you say ?

Here Killzone SF (a launch game) memory map example with 4736 MB used (1536+128+3072).

wx7XRap.png
 
No. It's as I said, 4.5 GB Ps4 and 5 GB 4Pro for GDDR5 memory.

https://www.eurogamer.net/articles/...tation-4-pro-how-sony-made-a-4k-games-machine
Mark Cerny's exact quote said:
"when you stop using Netflix, we move it to the slow, conventional gigabyte of DRAM. Using that strategy frees up almost one gigabyte of the eight gigabytes of GDDR5. We use 512MB of that freed up space for games, which is to say that games can use 5.5GB instead of the five"
 
Does your Windows (or MacOS) install feel laggy and unresponsive? I think the whole point is that the OS doesn't need the incredibly high bandwidth a game would demand. If anything, DDR4 is more suitable for OS like tasks.

No, but Windows is using a dual channel DDR4 3200mhz memory bus connected directly to my CPU, not a 32bit LPDDR4 memory bus hangin from an Arm SoC off a south bridge.
 
It will probably be 36 CU's. I'm sticking with my original expectations of smaller die with high clocks due to economics of 7nm. I think if it's using RDNA2 (or something in between) and 7nm EUV, the power efficiency can be increased significantly over what we see with the 5700 XT.
 
It will probably be 36 CU's. I'm sticking with my original expectations of smaller die with high clocks due to economics of 7nm. I think if it's using RDNA2 (or something in between) and 7nm EUV, the power efficiency can be increased significantly over what we see with the 5700 XT.
Yep, with those clocks it must be 36CUs. But even with only 36CUs, they must have a quite elaborate cooling solution because the heat density (to be dissipated) will be very high for a console.

I am really surprised about those clocks actually. It's very different than PS4, Pro and even Vita design. Maybe they had to have 36CUs because of PS4 and Pro BC ? So with so few CUs they had to clock it very high ?
 
Maybe they had to have 36CUs because of PS4 and Pro BC ?
That shouldn't be the case at all. GPU commands are dispatched and dealt with by the hardware schedulers. No code should have any idea what the CU configuration is and that should never impact whether code runs or not.

Just as there's advantage in faster, fewer CPU cores than more, slower cores, is there a case for GPU workloads, especially compute, where higher clocks and less parallelism is better? Perhaps they chose 2GHz and fewer CUs because they can, for the same total throughput as 1.5 GHz and 4/3 times as many CUs but faster individual thread (warp) execution. Also the schedulers and everything else will be running that much faster. [/theory]
 
Last edited:
That shouldn't be the case at all. GPU commands are dispatched and dealt with by the hardware schedulers. No code should have any idea what the CU configuration is and that should never impact whether code runs or not.

Just as there's advantage in faster faster, fewer CPU cores than more, slower cores, is there a case for GPU workloads, especially compute, where higher clocks and less parallelism is better? Perhaps they chose 2GHz and fewer CUs because they can, for the same total throughput as 1.5 GHz and 4/3 times as many CUs but faster individual thread (warp) execution. Also the schedulers and everything else will be running that much faster. [/theory]
or instead of performance the increase costs to cooling combined with higher clock speeds yields better cost performance than having a larger die for instance. Or... That sweet spot of performance and price is around 36-40 CU with subsidies in places that will reduce over time etc.
 
Zen 2 will be a massive jump in the consoles, however at this point we know very well how important memory latency is for games in this architecture.
I want to know how Microsoft and Sony will and can achieve a good enough latency to not hurt the performance with a unified GDDR6 pool.
And as a bonus I'd like to know if the next gen SOCs will have the halved write speeds present in some Ryzen skus and if this can hurt the system even more.
 
Which is going to benefit best next lithographic advance - clock speed or die size? Maybe smaller and faster is forward looking to better cost savings on the next node shrink?
For TSMC, the major advance for 5nm will be in density (~ x1.8), not frequency (small percentile).
That’s for low power circuits, mind you. Benefits to HP designs aren’t clear.
 
That shouldn't be the case at all. GPU commands are dispatched and dealt with by the hardware schedulers. No code should have any idea what the CU configuration is and that should never impact whether code runs or not.

Just as there's advantage in faster, fewer CPU cores than more, slower cores, is there a case for GPU workloads, especially compute, where higher clocks and less parallelism is better? Perhaps they chose 2GHz and fewer CUs because they can, for the same total throughput as 1.5 GHz and 4/3 times as many CUs but faster individual thread (warp) execution. Also the schedulers and everything else will be running that much faster. [/theory]
Interesting. What is everything else that could benefit from 2ghz ?

- Geometry processor
- Rasterizer
- Schedulers & dispatchers (graphics command processor)

What else ?
 
For TSMC, the major advance for 5nm will be in density (~ x1.8), not frequency (small percentile).
That’s for low power circuits, mind you. Benefits to HP designs aren’t clear.
Yes, and clock upgrade for ISO consumption on optical shrink is 15%, whereas power decrease for ISO clock on optical shrink is 20%.

It's almost always better to go wider than to clock faster, especially on GPUs.

The only valid reason I'd see for the consoles going for >2GHz on the GPU is if they're spending more transistors on achieving higher clocks, and/or there's a lot of untapped clock potential for RDNA in 7nm that RTG didn't explore because of time/resource constraints.
 
Interesting. What is everything else that could benefit from 2ghz ?

- Geometry processor
- Rasterizer
- Schedulers & dispatchers (graphics command processor)

What else ?
The amount of polygon budget set up should be amazing and the gpu has the necessary pixel shading power to shade them too if it has 64 ROPs.
 
Status
Not open for further replies.
Back
Top