General Next Generation Rumors and Discussions [Post GDC 2020]

That's more like it, but according to windowscentral, bcpack will mitigate that different largely. The SSD's might perform closer to eachother then some believe.
According to crytek developer resuming game is like 6s vs <1s so not so close ;)
 
According to crytek developer resuming game is like 6s vs <1s so not so close ;)
Both consoles resume < 1s right now. So he's referring to complete game switches out of memory. 1s to hold 6 different games is very amazing.

Do we know this for a fact? I'm very sceptical it's for backwards compatibility reasons.
It's not like they don't run at less CUs for legacy PS4 base emulation.
We don't know for sure. It may for legacy PS4 mode but most games should run PS5 native mode. We have no details on the other 2 modes (legacy 4Pro, and legacy 4)
 
So question on BCPack... Does what is "BCPack(ed)" need to be decompressed for GPU use or is it a new native GPU format?
 
We don't know for sure. It may for legacy PS4 mode but most games should run PS5 native mode. We have no details on the other 2 modes (legacy 4Pro, and legacy 4)

Would they compromise there design so they could have boosted backwards compatibility when they could have legacy PS4 mode work.
I feel it's more to do with cost reduction over time or resources would be better used elsewhere such as the SSD.
 
Do we know this for a fact? I'm very sceptical it's for backwards compatibility reasons.
It's not like they don't run at less CUs for legacy PS4 base emulation.
They are following multiples of the PS4 CUs though. Their BC method is probably tied to it for whatever reason. Their other option was probably 64 CUs which was probably avoided due to costs
 

Windows Central is referencing this tweet.
This guy is a compression expert based on his history
That's my point before, it's not their own sources, their reporting on what is being said on the net.
It's an important distinction when using them as a source.

He does seem to be an expert but also have no idea if what/how much he says is in anyway correct in relation to xsx.

If he's right and it allows partial texture loads instead of the full texture compared to possibly ps5, that can help with effective throughout
 
Thought experiment - assuming that the basic premise of the tweets was correct:

The likely places Series X could fall behind are: SSD throughput/latency, GPU bandwidth, gpu front-end (clock speed)

I can't think of anything else that would be really obvious.

I'm just speaking hypothetically, but thinking about gpu bandwidth, the only thing I can come up with is that the ratio of memory access is not quite right. Maybe they need a little bit more than the 10GB for fast gpu accesses.

The rest of my thoughts really come down to bandwidth mitigation, the memory model and how things are streamed from the nvme. They are using some kind of virtual memory setup where the nvme is addressable through a new API called DirectStorage. Maybe DirectStorage is a solution, to CPU overhead and latency, but comes with complexity. For example, on PS5 maybe there's just one way to access the drive through the filesystem. Just a basic open, read, write, close asynchronous or synchronous api like any other api. Maybe DirectStorage has a different programming model, so accessing data from the CPU and from the GPU are different. There could be unwanted complexity there. On top of the raw bandwidth disadvantage, maybe the programming model is just harder to use.

As for GPU bandwidth, the Series X really seems to be built around the idea of efficiency in accesses vs raw bandwidth. For example, Sampler Feedback's intention is to access only the parts of textures that are needed vs loading the whole texture into memory for sampling. The Sampler Feedback API allows you to figure out which parts of the texture will be sampled, and then load only those parts. The thing is, it seems to best fit into a particular model of virtual textures with a tile cache. You have a bit of added complexity in terms of learning a new API, but also are somewhat forced to adopt a particular memory management model for textures. Maybe that's not compatible with how some existing code bases are already set up. I don't know how cryengine works right now. So, as a thought, in the situation where the engine is streaming large textures from nvme in RAM, you're now in a situation where you're exceeding the 10GB because you're loading entire textures instead of the necessary parts, and you're hitting the limit of the SSD bandwidth because you're not selectively reading.

As for the front-end, I think it's somewhat the same situation. Mesh Shaders is a totally re-write of the render pipeline before rasterization. It's a fully threaded and compute driven approach. It will take time for developers to learn and optimize for mesh shaders. If you haven't done that yet, you're left with the existing pipeline which has bottlenecks that will favour high clock speeds.

This is all speculation on my part.
 
I don't doubt that the PS5 SSD is everything Cerny says it is, but the issue of latency does have me wondering about how PCIe cards will handle things.

Even with the large increases in peak read of say a 7GB/s SSD, the latency of Sony's solution would still be lower (as far as I can see). That poses a bit of a conundrum.

Perhaps Sony will end up making their own expansion drives using their own controller...?

If there's less latency in the overall PS5 solution vs commercial off the shelf products it's down to the custom elements outside of the SSD rather than the SSD itself which as far as I've read is only non standard in the number of priority levels it allows (6 vs 2). I guess you could argue it's size and speed are a little unusual too. I am struggling on the latency argument though. If we talk in system memory terms then both devices are pulling data from a high speed SSD over PCIe 4.0 4x interface through an IO block which connects to the memory/CPU/GPU via AMD's infinity fabric. Sony have added a few extra elements to IO block like the decompressor and coherency engines, but a decompressor isn;t going to reduce latency over an uncompressed data stream (if anything it's going to increase it). SO you're left with the coherency engines+cache scrubbers and any differences to the software stack between the two. But since no-one actually knows what Direct Storage does or how it works in the PC space I don't see how we're in a position to be comparing those at this stage.

So question on BCPack... Does what is "BCPack(ed)" need to be decompressed for GPU use or is it a new native GPU format?

What's the current native GPU compression format? Have we been comparing Apples & Oranges here? i.e. if GPU's already handle data natively in a compressed format then should we be adding that compression ratio to the raw speed throughput of all drives?
 
Isn't this impossible, though? I mean, I know we have a 22GB/s theoretical, best case with compression max, but this would be writing 13 and reading 13, giving a total of 26 in less than a second? Hasn't there been confirmed no compressor, only a decompressor? That would mean it would take more than 2 seconds to write whats in memory to the SSD at 5.5GB/s.

I really hope PS5 and XSX are not saving the entirety of RAM when switching between games or going into rest mode. Most of what is in RAM are game assets that can be re-loaded. You really only what the game state, i.e. where everything is in the game world and what it is doing - just like a save files. This should be a considerably smaller amount of data.

Otherwise, switching games is going to quickly eat onto that 825/1000Gb SSD quickly. I.e. you regularity swap between 5 games and you've just lost 50Gb of 'swap' space. Writing out gigabytes of RAM state every switch/sleep is going to put a lot of wear on those drives. That is potentially way more than typical PC drives have to contend with.

SO you're left with the coherency engines+cache scrubbers and any differences to the software stack between the two. But since no-one actually knows what Direct Storage does or how it works in the PC space I don't see how we're in a position to be comparing those at this stage.

Are you talking about the PC you and I own today or PCs that will be built in a 12-24+ months time with new bus architectures and controllers and I/O chains that take advantage of DirectStorage? Because an API cannot negate the bottlenecks that exist in the PCs that you and I own today. You need better hardware. New hardware needs a new API. The API has to come first, the hardware will come after.

You need new hardware to support a fast SSD coupled with controller that can decompress certain data and dump it at ~20Gb/s to DDR4 and/or GDDR6 without impact the rest of the system. That's the goal.
 
Last edited by a moderator:
I was thinking about suspend. It would be possible to do that in super smart way but it would mean game engines have to support it. Low hanging fruit would be force developers to use streaming heavily. Then when suspending to disk only store metadata and when returning to game stream the content back from original location. This would likely save insane amount of space on textures and sounds and other immutable data. This could also go long way on fixing the load times issue, stream first the lowest level lod to be quickly in game and proceed to stream in the higher quality assets as soon as possible.
 
Do we know this for a fact?
Pretty much. There's no reason for 36 CUs beyond that, and we know devs did target specific CUs with their code. It seems bizarre that the GPU is so constrained as we're used to swapping GPUs with differing core counts on the PC and it just working, and it's hard to imagine why devs would be targetting so low level still that games can break on compatible hardware. But if you think about it, there's some reason, even if odd, to go with 36 CUs whereas no particular reason to go with a really hot, narrow chip. So BC seems the only justification.
 
Maybe they need a little bit more than the 10GB for fast gpu accesses.
My question still comes down to out of 13.5GB how little would the game code, audio, anything else that doesn't need high bandwidth take up as 3.5 GB sounds pretty small to me, I'm more expecting that the slow access stuff over flow into fast section than the other way around.
But I'll be more than happy for someone to show me that game engine only needs a fraction of 3.5GB and that 10 is a huge hinderance compared to 11GB.
For example, Sampler Feedback's intention is to access only the parts of textures that are needed vs loading the whole texture into memory for sampling.
this is also where I believe bcpack comes into play, that it allows better partial texture retrieval compared to other package formats.
Could be mis remembering though.
But I thought that was one of the positives that he was saying, and why throughput essentially could be better than initial perceptions.

In the end devopers will have to code to take advantage of these things like always, just depends how hard it is. But some things, will just get used as it's the best way to get the performance you require.
 
Microsoft has insisted that the Xbox Series X frequency is constant under any circumstances, but Sony does not have such an approach and provides the console with a certain amount of energy to use it as a variable and depending on the situation. What are the differences between the two and which will be better for the developer?

What Sony has done is much more logical because it decides whether the GPU frequency is higher or the CPU's frequency at certain times, depending on the processing load. For example, on a loading page, only the CPU is needed and the GPU is not used. Or in a close-up scene of the character's face, GPU gets involved and CPU plays a very small role. On the other hand, it's good that the Series X has good cooling and guarantees to keep the frequency constant and it doesn't have throttling, but the practical freedom that Sony has given is really a big deal.

substantially different take on their smart shift from what everyone else has talked about. He's making ti seem like power is shifting around all the time, when we have been told that it's holding at max at all times.
 
Back
Top