General Next Generation Rumors and Discussions [Post GDC 2020]

https://www.tomshardware.com/news/a...u-arden-source-code-stolen-100-million-ransom

AMD posted a press release to its website today announcing that it had found stolen graphics IP posted online, followed quickly by news from Torrentfreakthat the information pertains to source code for Big Navi and Arden GPUs. Torrentfreak claims to have contacted the hacker responsible, who claims the information is worth $100 million and is seeking bidders.

AMD has filed 'at least two DCMA notices' against Github repos that contained the stolen source code for the company's Navi 10, Navi 21, and Arden GPUs. The latter is arguably the most interesting as it powers Microsoft's forthcoming Xbox Series X consoles, while Navi 21 is thought to be the design for the RDNA 2 'Big Navi' GPUs.
 
You’re going to have to give me an example in a game in which turning fast enough that the texturing can’t keep up and loads in blurred before regaining focus. I haven’t seen it before. I feel like we’re discussing MIPs quality instead of access speed.
Did you watch the presention? He's talking about releasing the memory of everything 180 behind the player. No game every did that because no storage have ever been fast enough to reload it back as the player turns. It frees up half the asset memory. Limitation is how fast the player turns to have enough time to load pretty much everything with a full 180 turn.
 
Did you watch the presention? He's talking about releasing the memory of everything 180 behind the player. No game every did that because no storage have ever been fast enough to reload it back as the player turns. It frees up half the asset memory. Limitation is how fast the player turns to have enough time to load pretty much everything with a full 180 turn.
Yes. Sorry. I didn’t realize you were referring to Cerny’s example.

At teh same time, I don't know many games that would need that situation. The games in which we turn super fast, they solved that problem already. TLOU and UC4 and the titles that put tons of effort into graphics, you have locked sensitivity on how fast you and move, traverse and looking around.

Its an interesting example, but I don't see that many use cases in which people are going to 180 here. Unless you take control away from the player.

Back when we played Quake, CS, etc, all those games had 360 no-scopes bouncing everywhere and we didn't have those issues at all back then. People had super high sensitivity and no one complained about texture load in. We just held things in memory.
look how fast we go, there wasn't texture streaming problems back then.

Games are different now of course, but the use cases for this type of twitch action where you need to look everywhere at once, they made those and solved those problems by holding a lot more in memory.
 
Did you watch the presention? He's talking about releasing the memory of everything 180 behind the player. No game every did that because no storage have ever been fast enough to reload it back as the player turns. It frees up half the asset memory. Limitation is how fast the player turns to have enough time to load pretty much everything with a full 180 turn.

You sure? I'm think I need to check GTA4 or 5 because I remember playing an open world game where in you can readily have your view do a 180 flip to see whats behind you when you are driving in third person mode.
 
You sure? I'm think I need to check GTA4 or 5 because I remember playing an open world game where in you can readily have your view do a 180 flip to see whats behind you when you are driving in third person mode.
That's keeping the LOD assets based on the distance from the player, the game keeps 360 degrees of data in memory. (sure there are exceptions like car games and on-rail shooters, no idea about gta car driving)

Cerny proposed to cut it in half, or possibly by three if they go 120 range in front. If it works it would double or triple the asset details possible at any moment. That technique is limited by the amount of data you can load within the time frame that the player is turning around, because a full 180 potentially ends up loading a completely new dataset.

I'm sayimg it contradicts the opinions that the SSD speed cannot improve the IQ and only improves load times.
 
Last edited:
That's keeping the LOD assets based on the distance from the player, the game keeps 360 degrees of data in memory.

Cerny proposed to cut it in half, or possibly by three if they go 120 range in front. If it works it would double or triple the asset details possible at any moment. That technique is limited by the amount of data you can load within the time frame that the player is turning around, because a full 180 potentially ends up loading a completely new dataset.

I'm sayimg it contradicts the opinions that the SSD speed cannot improve the IQ and only improves load times.

Yep, that was he said. Not that you need an SSD to do a 180. When playing GTA you need a lot of stuff being loaded into your RAM that you may never use if you do not decide to turn you character around, but it has to be there just in case. Cerny's argument is that PS5 SSD is so fast, that you actually don't to have all those unused assets sitting on the RAM and only retrieve them from the SSD when necessary.
 
That's keeping the LOD assets based on the distance from the player, the game keeps 360 degrees of data in memory.

Cerny proposed to cut it in half, or possibly by three if they go 120 range in front. If it works it would double or triple the asset details possible at any moment. That technique is limited by the amount of data you can load within the time frame that the player is turning around, because a full 180 potentially ends up loading a completely new dataset.

I'm sayimg it contradicts the opinions that the SSD speed cannot improve the IQ and only improves load times.

With regards to this point, does anyone know if IQ could be improved in a current generation game substantially by using a SSD?, or is IQ currently limited by other factors as well in the current generation?.
 
With regards to this point, does anyone know if IQ could be improved in a current generation game substantially by using a SSD?, or is IQ currently limited by other factors as well in the current generation?.
I wonder too, surely games with a dynamic LOD can crank up the draw distance and the LOD bias, the streaming will be pretty much instantneous compared to what the game was designed to stream (a 50MB/s hdd)
 
That's keeping the LOD assets based on the distance from the player, the game keeps 360 degrees of data in memory. (sure there are exceptions like car games and on-rail shooters, no idea about gta car driving)

Cerny proposed to cut it in half, or possibly by three if they go 120 range in front. If it works it would double or triple the asset details possible at any moment. That technique is limited by the amount of data you can load within the time frame that the player is turning around, because a full 180 potentially ends up loading a completely new dataset.

I'm sayimg it contradicts the opinions that the SSD speed cannot improve the IQ and only improves load times.
It still needs to go into RAM though. You’re just not caching as much as you would be with a slower HDD. So the solution is the same as today but you don’t need a huge buffer amount. There’s an upper limit on texture quality as well. If the texture quality exceeds the resolution it’s pointless.
 
That's keeping the LOD assets based on the distance from the player, the game keeps 360 degrees of data in memory.

Cerny proposed to cut it in half, or possibly by three if they go 120 range in front. If it works it would double or triple the asset details possible at any moment. That technique is limited by the amount of data you can load within the time frame that the player is turning around, because a full 180 potentially ends up loading a completely new dataset.

I'm sayimg it contradicts the opinions that the SSD speed cannot improve the IQ and only improves load times.

I won’t denied that SDD will improve streaming because it will as there is no denying that increasing drive to RAM bandwidth by a minimum factor of of 24 is going to have an affect.

However, not having any texture related data of what behind you in RAM is not going to be absolutely true in most cases. And thats even if the SDD can completely keep up.

A lot of texture data that’s behind you will be in your view in front. If you are outside, there will probably be no need to stream in ground or grass textures if you turn around. They are already in RAM.

Current systems already stream in the highest resolution texture at the last possible moment dependent on hardware. Level 0 mipmaps take up 2/3rd of a texture mipmap size.

Plus we have yet to see how hungry RT will be when it comes to memory storage and bandwidth needs. Most of the memory savings may be used to accommodate RT. And streaming in textures faster than ever might not mean a whole lot if we drop down from 4K/1800p back to 1080p or 1440p. LOL
 
Last edited:
So why did they not stick to 8 GB total?

I don’t think 8GB are enough for the 4K buffer and some AA, while needing space for the OS and game logic as well. Some PC games go over 8GB in VRAM (not including system memory) when certain post processing effects are enabled. On top of that flexibility, not every game is going to be constantly streaming from the SSD.
 
I'm really curious if the 36CU's are purely for backwards compatibility or if it's also some degree of forward planning for a "Pro." Extract the best performance possible from the smallest possible chip, with a view to doubling that chip, at a relatively economical level, within 3-5 years.

They already have the engineering work they've done on the PS4Pro's "butterfly" design, and 72CU's on 5nm might not be all that much bigger than the XSX's 360mm2 beast. I need to go and check out the kind of area reduction 5nm will bring though...

Given that they've gone with 14gbps GDDR6 *sigh* a jump to 16gbps would make it relatively cheap to bump up the memory bandwidth too. I'd love some HBM and 1TB/s bandwidth there though :love:
36 CUs would be easier to double, though there's less room below since the Series X would be at an intermediate position between the PS5 and a doubled Pro. Sony may need to consider if doubling is enough, especially if there were to be a Pro variant of the Series X.
Going from 14 to 16 Gbps would be a scant upgrade, and proportionally weaker than the PS4 to Pro transition with a ~14.3% bandwidth improvement stretched over 2x the CU. Perhaps there would be an even faster interface speed, or a change in width, such as at least matching 320-bits, if not going wider.
Sony's variable clock solution might have some kind of impact on a future Pro, since we'd assume Sony wouldn't want to drop the clock. Raising the clocks could be interesting, though the current clocks are being described as being in a region that's already inefficient. 72 or more CUs may be interesting versus a competing xPro if they are both much larger in CU count but one is still striving for constant clocks. There may be some load scenarios where its costlier or more difficult to hold to a constant clock with many more active units.
What else could be scaled with a Pro console like the CPUs might be an interesting question. Zen 2 seems to be a more successful initial implementation versus Jaguar, so the clocks currently given aren't artificially low. A 33% jump would give clocks that would be ~4.7GHz, and node jumps at that clock range are often threatened with clock regressions. Don't know if they'd try for a clock bump, or if a non-standard number of additional cores could be an option.

Maybe the restriction is not a product of some issue with the actual configuration of the CUs. But rather RDNA’s CUs poorly mimic the performance of GCN’s CUs in some form or fashion, and the frequency of a RDNA CU must be boosted to compensate.

In other words 2.23 GHz isn’t some consequence of having just 36 CUs but the other way around. BC requires RDNA CUs running at high frequency to perform adequately across the board. The frequency is high enough to limit the number CUs that Sony can readily use in its design.
There is a Sony patent about varying clocks on the fly so that a new unit can emulate an older unit's performance, but with a true clock that is potentially faster and a spoof clock that the legacy software perceives as the original fixed clock.
https://patents.google.com/patent/US9760113B2/en
A mildly higher true clock could paper over any higher internal latencies with clock speed so that by the time the spoof clock has reached what the older code expects for forward progress, the emulated operation is done. 2.23 GHz versus 800 MHz or 911 MHz could be too much, but that might be why there are BC modes--whose clocks may still vary somewhat above the advertised base clock depending on the characteristics of what is running.

The lack of abstraction seems to mean that PS5 was limited to 36 CUs. How long will that limit extend? PS5 needs a decent abstraction so devs can't rely on specific CUs or registry files, and have to access the GPU through a degree of abstraction so that it can be replaced. The old idea of hitting the console hardware is dead. For a platform with longevity in its library, abstraction is pretty essential.
The ISA and hardware itself have their own abstraction. The architecture promises certain outcomes or responses to various inputs, but whether those responses are accurately depicting what is happening internally are not required information for the software. Many values like the wave or CU ID are accessed with operations that read from system registers or are privileged locations. The hardware can give an answer that is valid in terms of what is possible for the legacy software, even if the true answer in terms of the modern implementation is different.
CPUs running a VM can trap out guest requests for CPU or system information, where the hypervisor or a storage location that tracks the host vs guest relationship can patch in values appropriate for the guest.


We don't, but if it was targeting price-point first, why go chasing TFlops that may possibly require extensive cooling solutions?
It's possible Sony's design may have had more pessimistic projections for the cost of 7nm wafers at the time the decision was made, so a bit less die area may have made more economic sense. It might have been considered easier to dial back on an over-engineered cooler with the next console hardware revision than it is to eat the cost of a die that would need to wait until 5nm for the next adjustment.
 
On subject of memory contention:

The little we know about Oberon is that the bus is 256 bits, this means that AMD and Sony may encounter the same problem with Microsoft with memory contention but they have two ways to solve it:

  • The Microsoft method on Xbox Series X: Assign some of the RAM addressing to the GPU, this allows them to have a simpler uncore / Data Fabric and add more CUs in return.
  • The AMD SoC method on PC: This means that the GPU has no private data path and everything goes through the Data Fabric, there is no contention but the uncore / Data Fabric grows accordingly and there is less space for the CUs.
In the second method, the GPU has access to all the RAM, being able to use all the bandwidth and there is no problem of contention. The question is what Mark Cerny has seen most suitable and this is important because it limits the number of Compute Units that can be placed in space by having a more complex uncore. The advantage? Developers don't have to worry about working with two different memory spaces at the addressing level.

http://disruptiveludens.com/la-ultima-especulacion-sobre-ps5-antes-del-final
 
PCIe 4.0 drives are already hitting 5GB/s with 7GB/s expected by the time the new consoles hit the market. PCIe 5.0 is due out in 2021 with first gen drives likely to be hitting 10+ GB/s.

The drives may be capable of 10Gb/s theoretical transfer but the filesystem, driver stack and I/O china will eat heavily into that. It's why SSDs in RAID arrays don't eliminate loading times on PC now. Even Stadia running on server infrastructure cannot do that.

Pulling data from the SSD puts it in main RAM, and data is often consolidated in .PAK files and needs separating, possibly decompressing or even converting by the CPU. And anything for the GPU needs to be transferred there. The PC is a vastly more flexible architecture which necessitates more complexity and that complexity introduces points for bottlenecks. I have no doubt that future PC architectures, bringing faster southbridges and faster local bus will surpass consoles but you need to brute force through the complexity and bottlenecks compared to a PS5 where has a mad SSD connected to a mad controller that talks directly to a single pool and RAM and is connected to the GPU.
 
The drives may be capable of 10Gb/s theoretical transfer but the filesystem, driver stack and I/O china will eat heavily into that. It's why SSDs in RAID arrays don't eliminate loading times on PC now. Even Stadia running on server infrastructure cannot do that.

Pulling data from the SSD puts it in main RAM, and data is often consolidated in .PAK files and needs separating, possibly decompressing or even converting by the CPU. And anything for the GPU needs to be transferred there. The PC is a vastly more flexible architecture which necessitates more complexity and that complexity introduces points for bottlenecks. I have no doubt that future PC architectures, bringing faster southbridges and faster local bus will surpass consoles but you need to brute force through the complexity and bottlenecks compared to a PS5 where has a mad SSD connected to a mad controller that talks directly to a single pool and RAM and is connected to the GPU.

An example of less efficiency was shown during the Cerny presentation because NVME standard only have two queues you need a 7 GB/s to keep up with PS5 controller. Inside the patent but they did not talk about it in the presentation the read unit is expanded. SRAM help with latency. They have coherency engine and GPU scrubbers helping write data in the memory range the GPU will use data for display things on screen. There is the DMAC and the two coprocessors probably ARM managing the SSD.

Many custom things helping with efficiency.

 
Last edited:
The drives may be capable of 10Gb/s theoretical transfer but the filesystem, driver stack and I/O china will eat heavily into that. It's why SSDs in RAID arrays don't eliminate loading times on PC now. Even Stadia running on server infrastructure cannot do that.

Indeed but isn't DirectStorage designed to mitigate much of that?

That's not to say there aren't still advantages to the PS5's customizations, there obviously are, but it's not like comparing how today's SSD's perform on the PC that don't benefit from DirectStorage.
 
So you're telling me you can add a discrete GPU / more ram via the SSD bay. :runaway:

https://forum.beyond3d.com/posts/2114618/

I am explaining than when we will see data driven rendering it will be a game changer. ;) Here it is the memory size and the SSD streaming the limit... And it will help do things impossible in realtime... This is Ubi soft R&D for next generation.

EDIT: This is not something we see this generation maybe why people don't understand...;-)

 
Last edited:
Back
Top