Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

Status
Not open for further replies.
That's understood,. The concern is if the Kraken-type compression on SSD allow PRT to be read. If not compressed beyond DXTC, we could store and read texture tiles. If we crunch a whole texture down to its smallest size for fast loading, can we then load individual tiles within that archive? One would assume not. One can't dive into a .zip of a text document and pull out letters indexed at position 15 and 27 without unzipping the whole thing.

For SVT, textures are tiled at a fixed size during development. When using PRT only high resolution tiles that are visible are transferred to VRAM so you can’t compress entire mips (outside those that’s smaller than the fixed size of the tile). You compress the individual tiles so each mip level is represented by a composite of compressed tiles.

The ZLIB or Kraken compression doesnt inhibit the ability to request individual tiles needed by SVT. The unreal engine already accommodates the use of zlib for virtual texturing to minimize I/O costs. Unreal also allows the use of crunch which employs both lossy and lossless compression on disk and transcodes to a supported block compression format to VRAM.
 
Last edited:
Byte-addressable storage is already a thing in server-space, but there aren't - or were (and I think it's still the case) - any commercial byte-addressable filesystems so they tend to get utilised just like slow RAM with the inherent advantages / disadvantages that you would expect. I think their use is pretty niche, generally if you need byte-addressable storage you're working in a field where funding is sufficient to shove terabytes of RAM into yours server. In my previous job, some of our servers had 2 petabytes of RAM. And it wasn't enough! :no:

I know.. Already mentioned non-volatile memory in RDMA applications when discussing the MS XVA and its "100 GB of available exended memory". There is no byte-addressable NVME SSD currently in production or even as a publicly known prototype.
 
Write performance is very dependent on block size and page size. And a lot of shuffling data and remapping magic to make it not suck as much as it does on paper. Block sizes are gigantic today.

But reaching full bandwidth on read is more dependent on the read size versus IOPS (and random IOPS is related to page size x channels x speed, if the controller can keep up). In theory PS5 controller should be capable of over a million IOPS, so even down to 8KB random read shouldn't be a problem as long as the queues can be filled fast enough. There's no seek but there's an overall latency of something like 25us to 50us.
 
Last edited:
A byte addressable SSD?
Nope. Changes they were considering in SSD access to mitigate overheads. I think it's actually the file allocation table, with larger data-block sizes.
http://www.freepatentsonline.com/y2017/0097897.html

Disclosed herein is an information processing device including a host unit adapted to request data access by specifying a logical address of a secondary storage device, and a controller adapted to accept the data access request and convert the logical address into a physical address using an address conversion table to perform data access to an associated area of the secondary storage device, in which an address space defined by the address conversion table includes a coarsely granular address space that collectively associates, with logical addresses, physical addresses that are in units larger than those in which data is read.​
 
Write performance is very dependent on block size and page size. And a lot of shuffling data and remapping magic to make it not suck as much as it does on paper. Block sizes are gigantic today.

But reaching full bandwidth on read is more dependent on the read size versus IOPS (and random IOPS is related to page size x channels x speed, if the controller can keep up). In theory PS5 controller should be capable of over a million IOPS, so even down to 8KB random read shouldn't be a problem as long as the queues can be filled fast enough. There's no seek but there's an overall latency of something like 25us to 50us.

How do the reads actually work? If you read from an ssd, does it read a full block and then save the page that you requested, or can it actually do a page-sized read if the pages are smaller than the blocks? My thinking is along of a cache line from cpu to memory, where you always read the whole line even if you only need a few bytes. If you read more than one page from the same block, does it actually just read the block once so the second page is read from the drives cache?
 
Nope. Changes they were considering in SSD access to mitigate overheads. I think it's actually the file allocation table, with larger data-block sizes.
http://www.freepatentsonline.com/y2017/0097897.html

Disclosed herein is an information processing device including a host unit adapted to request data access by specifying a logical address of a secondary storage device, and a controller adapted to accept the data access request and convert the logical address into a physical address using an address conversion table to perform data access to an associated area of the secondary storage device, in which an address space defined by the address conversion table includes a coarsely granular address space that collectively associates, with logical addresses, physical addresses that are in units larger than those in which data is read.​

Interesting. Thats sounds a lot like the TLB in a virtual memory paging system.
 
How do the reads actually work? If you read from an ssd, does it read a full block and then save the page that you requested, or can it actually do a page-sized read if the pages are smaller than the blocks? My thinking is along of a cache line from cpu to memory, where you always read the whole line even if you only need a few bytes. If you read more than one page from the same block, does it actually just read the block once so the second page is read from the drives cache?
I don't know that deep... The nand chips do have a full page register for any operation. You send a command to change the address which makes the page data available in the page register, then you can either send a command to read/write part of it or all of it, or copy some of it to another page (great for wear leveling operations). Also there's multiple operations that can done at the same time on different planes/blocks.

On the controller side to the PC, it has to address and reads by sector size, 512 bytes small mode or 4K big mode.

So as a shortcut the simplest way is to just use the controller's random IOPS specs which is a fraction of the total number of transactions happening between the controller and the nand chips.

Phison 5018-E18:
Nand side: 8ch at 1200MT per channel = 9600 million transactions
Controller: 1 million 4K random read IOPS
So it's 9600 transaction cycles on the nand bus per complete cmd/addr/data operation, including status polling cycles, ECC data, everything.

Unclear if it means 8KB page size are required to reach those specs. Like would using chips with 16KB page size drop the IOPS by half? Or is the controller 1M IOPS regardless of data organisation?
 
Last edited:
@MrFox I was trying to read up on it and I found it very confusing. I know a block is made of pages and they either have a native page or block size. The controller basically can translate page reads from whatever the filesystem is trying to access to the reads from the nand chips, but I'm not sure what an actual read looks like. It seems like larger page sizes read to better throughput, up to a point, so I wasn't sure if reading smaller pages meant you're wasting parts of your reads because it's reading a larger page or block. Confusing for the layman.
 
I have a good question id like to ask, first of all im new here so "hello world", question is regarding the ssds used as virtual ram in nextgen consoles especially series x, question is about tge 100gb that ms keeps mentioning could it mean that its used as ram cache during runtime? For instance ur in a war game and you do some destruction. Brocken walls bullet holes fragments everywhere dead npcs so could this data be written on the 100gb to avoid dissapearing.
Cause currently dead npcs dissapear most things dissapear since the ram cant hold enough data?
 
I have a good question id like to ask, first of all im new here so "hello world", question is regarding the ssds used as virtual ram in nextgen consoles especially series x, question is about tge 100gb that ms keeps mentioning could it mean that its used as ram cache during runtime? For instance ur in a war game and you do some destruction. Brocken walls bullet holes fragments everywhere dead npcs so could this data be written on the 100gb to avoid dissapearing.
Cause currently dead npcs dissapear most things dissapear since the ram cant hold enough data?
That's one of the anticipated uses for SSDs in next-gen consoles. Some of that is data persistence and some is just drawing overhead though. eg. You have to have dead bodies disappear because even if you have enough RAM to store them (and you do; they're just a few numbers to indicate where a copy of an object is to be placed), eventually you can't render that many piled up bodies fast enough, and can't compute the physics needed to clamber over them all.

Expect improvements in some games, but also expect for game-play/logistics reasons that there'll still be unnatural behaviours in games like disappearing things.
 
What's the size of pages in sony's ssd? I don't think we know. Often minimum page size on ssd is quite large. Could be the flash pages are 256kB or even 512kB. The optane argument really is valid for small pages as ssd's do have overhead on the minimum read size and optane can read storage like ram. That's also the same reason why trim was invented and early ssd drives could hitch when disk starts to be full(ish). There would be free space on disk but no free blocks. Drive would on writes then have to read many blocks that are not completely full, rearrange data and write back. Trim does this behind the scenes.

Games would have to pack the data into fairly sized chunks to optimize ssd bandwidth. In best case the ssd block size and kraken compression block sizes are same. I assume this is no problem because most data is large and the very small data you can cache in ram anyway if it turns out to be issue(it's small, pack small data together to larger blocks and cache it in ram as needed...)
Should be something optimized to transfer trough the 12 channels bus in one or two bus clock cycle ??? The match between filesystem, ssd and bus is what makes it so fast IMHO.
 
That's one of the anticipated uses for SSDs in next-gen consoles. Some of that is data persistence and some is just drawing overhead though. eg. You have to have dead bodies disappear because even if you have enough RAM to store them (and you do; they're just a few numbers to indicate where a copy of an object is to be placed), eventually you can't render that many piled up bodies fast enough, and can't compute the physics needed to clamber over them all.

Expect improvements in some games, but also expect for game-play/logistics reasons that there'll still be unnatural behaviours in games like disappearing things.
Judging by how the ue5 demo works it only loads the amount of triangles needed per pixel count so that wouldnt matter how many objects are on screen, and about physics could be an issue for desd bodies but could allow for more dead npcs than the 13gb ram limit, or they could make bodies far from player static and the ones closer rigid,
Thats my assumption since just loading raw original data from ssd and not being able to add data during runtime is a bit pointless!
 
UE5 only does that specific function with its static terrain. It doesn’t apply this everywhere however. So there are limitations to their technology, like all technologies. That static terrain will look sweet, but may not be what most games want. Some may want more characters on screen, or a lot of different objects on screen. We aren’t sure what it can handle. I think we are convinced it can handle a lot, but it’s quite unknown.
 
UE5 only does that specific function with its static terrain. It doesn’t apply this everywhere however. So there are limitations to their technology, like all technologies. That static terrain will look sweet, but may not be what most games want. Some may want more characters on screen, or a lot of different objects on screen. We aren’t sure what it can handle. I think we are convinced it can handle a lot, but it’s quite unknown.
We dont know for sure if the nanite tech only works for static terrain even if it does im just talking about streaming polygons you see on a frame that shouldnt be expensive or only require nanite same as virtual texturing it isnt just terrain you can do objects too trees buildings grass and so on even characters, so given that wizwig then the 100gb on the xva makes sense you can cache additive data created during runtime when playing a game, and thats tha i said not just npcs can be destroyed buildings, bullet holes, rubble alot of states and interactions for instance cars wont disappear as they do in gta when u look away alot of such states can be stored on the ssd during runtime and dont have to be just things u see on screen.
 
We dont know for sure if the nanite tech only works for static terrain even if it does im just talking about streaming polygons you see on a frame that shouldnt be expensive or only require nanite same as virtual texturing it isnt just terrain you can do objects too trees buildings grass and so on even characters, so given that wizwig then the 100gb on the xva makes sense you can cache additive data created during runtime when playing a game, and thats tha i said not just npcs can be destroyed buildings, bullet holes, rubble alot of states and interactions for instance cars wont disappear as they do in gta when u look away alot of such states can be stored on the ssd during runtime and dont have to be just things u see on screen.
Fairly positive it’s static meshes only. So nothing that can deform every frame. Ie character animations etc.

They are unlikely to keep things in memory for that type of thing unless it’s absolutely vital to the gameplay. You’ll explode budgets if you keep layering things onto the last thing.
 
Judging by how the ue5 demo works it only loads the amount of triangles needed per pixel count so that wouldnt matter how many objects are on screen...
Storing the data for dead bodies isn't the problem. You'll have a copy of the body mesh in memory, and draw that data for each body on screen; each 3D object on screen only occupies as much RAM as one model regardless how many times its drawn.

eg:
There's only one model here, so the amount of bodies isn't dependent on memory and doesn't require virtual RAM to be able to add more or keep them. However, drawing and processing them can end up demanding.

Nanite is something different to just having an SSD, subdividing the view space to select only the triangles that need to be drawn. Even without SSD, if a game could cull and draw triangles super efficiently it wouldn't have a problem rendering thousands of piled up bodies as it would only draw the visible triangles, like Nanite, but that likely can't be done for dynamic objects. As such, we're stuck processing each triangle for visibility, and whatever optimisations we can do there for occlusion and triangle rejection.

Possibly, using the SSD, a dead body could be baked into static geometry and then streamed from SSD, but I doubt that can happen. The process of creating the streamable 3D data from the raw geometry is probably quite intensive and not something that can happen in realtime, especially with a game having to be run concurrently. It'll be interesting to learn how the UE5 demo handled its few scenery changes.

...not just npcs can be destroyed buildings, bullet holes, rubble alot of states and interactions for instance cars wont disappear as they do in gta when u look away alot of such states can be stored on the ssd during runtime and dont have to be just things u see on screen.
Yes, some stuff can be stored as persistent data. One just has to be aware of what the limiting factors are. If the reason something isn't persistent is because there isn't fast writeable storage, SSD will improve that. If it's because of something else, like processing power to simulate the world space it occupies, then fast writeable storage won't solve persistence, although something else in next-gen might.

As a mod, may I also politely point you to the posting requirements of B3D and the need for (mostly!) correctly written English using sentences and punctuation. We're not grammar nazis but we do request a level of readability just to aid with communication.
 
Storing the data for dead bodies isn't the problem. You'll have a copy of the body mesh in memory, and draw that data for each body on screen; each 3D object on screen only occupies as much RAM as one model regardless how many times its drawn.

eg:
There's only one model here, so the amount of bodies isn't dependent on memory and doesn't require virtual RAM to be able to add more or keep them. However, drawing and processing them can end up demanding.

Nanite is something different to just having an SSD, subdividing the view space to select only the triangles that need to be drawn. Even without SSD, if a game could cull and draw triangles super efficiently it wouldn't have a problem rendering thousands of piled up bodies as it would only draw the visible triangles, like Nanite, but that likely can't be done for dynamic objects. As such, we're stuck processing each triangle for visibility, and whatever optimisations we can do there for occlusion and triangle rejection.

Possibly, using the SSD, a dead body could be baked into static geometry and then streamed from SSD, but I doubt that can happen. The process of creating the streamable 3D data from the raw geometry is probably quite intensive and not something that can happen in realtime, especially with a game having to be run concurrently. It'll be interesting to learn how the UE5 demo handled its few scenery changes.

Yes, some stuff can be stored as persistent data. One just has to be aware of what the limiting factors are. If the reason something isn't persistent is because there isn't fast writeable storage, SSD will improve that. If it's because of something else, like processing power to simulate the world space it occupies, then fast writeable storage won't solve persistence, although something else in next-gen might.

As a mod, may I also politely point you to the posting requirements of B3D and the need for (mostly!) correctly written English using sentences and punctuation. We're not grammar nazis but we do request a level of readability just to aid with communication.

Ok fare enough but yeah i mean the mesh data of one body is stored in ram thats to reduces cpu draw calls more like how instancing works but all those bodies are physically in memory have to be theres no magic here. And i bet the ps5s ssd is fast enough to do both stream from ssd and cache+ i dont think caching would take any significant performance budget.. + this is the only way that 100gb makes sense otherwise its just some fancy word.

Sorry for my spelling im using a phone so im typing fast
 
@MrFox I was trying to read up on it and I found it very confusing. I know a block is made of pages and they either have a native page or block size. The controller basically can translate page reads from whatever the filesystem is trying to access to the reads from the nand chips, but I'm not sure what an actual read looks like. It seems like larger page sizes read to better throughput, up to a point, so I wasn't sure if reading smaller pages meant you're wasting parts of your reads because it's reading a larger page or block. Confusing for the layman.
It certainly does waste potential bandwidth, if there's more bandwidth available. So the point where a bigger and bigger page no longer provides a throughput advantage is when the number of random page load per seconds the chip is capable of exceeds the interface anyway.

So if a typical nand on a channel can execute 80,000 random read requests per second (1/25us = 40K, usually 2 planes per die which execute in parallel), and it's interface is 667MB/s, the page size becomes a bit irrelevant above 8KB. If it can only do 40,000 reads per seconds, that would be 16K, and so on. The chips connected to the controller are not necessarily the perfect ones that can do it's max rated IOPS, same for max bandwidth.

The organization is:
many planes per channel
many blocks per plane
many pages per block
many cells per page
many bits per cell

Yes, it's confusing, I'm pretty sure 32KB organization would saturate any chips available, probably 16KB too, but the datasheets are no longer available publicly, so I don't know if nand chips are faster than 25us today, or if it's become worse with QLC. Or if they are moving to 4 planes to improve IOPS, etc....

EDIT: Nope! Can't do partial read!

www.onfi.org/-/media/client/onfi/specs/onfi_4_2-gold.pdf
Command set is chapter 5 page 182.
Read sequential and read random chapter 5.15 page 231.

While the writes can be partial pages, the read ops always start at column 0. The small data operations within a page are only applicable to writes and copyback. There doesn't seem to be a "small read op" that accepts a column as argument (it's right there but the column is ignored). So maybe it could read the first part of a page and move on to something else, it cannot write to the page's data pointer. Unless I missed an instruction somewhere.

Hmm... The controller is entirely responsible for culling this, and if they start making the pages significantly bigger than 16KB they will need the next onfi specs to start supporting the column parameter in the random read cache op, or raise the interface bandwidth accordingly so the controller have enough to work with.
 
Last edited:
You pessimist. The UE5 demo on PS5 was 1440p@30hz, Epic are apparently aiming for 1440p@60hz. This will calculate as 4k@30hz.

Let's not forget that 4k is actually the biggest resolution jump we've had for a console generation. I'm pretty sure that we're seeing one of the biggest jumps in compute from base consoles to next gen too.
The UE5 demo has no ray tracing.

Even UE5 demo can be optimized to 1440p@60fps, when developers seek for better visual quality with raytracing or other tech, the resolution and performance will drop again and we may still have only 50% of 4k resolution with base consoles.
 
Status
Not open for further replies.
Back
Top