Blazing Fast NVMEs and Direct Storage API for PCs *spawn*

Discussion in 'PC Hardware, Software and Displays' started by DavidGraham, May 18, 2020.

  1. PSman1700

    Legend

    Joined:
    Mar 22, 2019
    Messages:
    7,123
    Likes Received:
    3,093
    The GPU is probably the more advanced and much faster path. The fixed function hardware is just that, fixed. NV (and AMD) can up the speed by allocating more resources for example.

    I think some have shot down NV's decompression tech way to early, without knowing how it even works.
     
  2. Remij

    Regular

    Joined:
    May 3, 2008
    Messages:
    684
    Likes Received:
    1,268
    Never bet against Nvidia.
     
    PSman1700 likes this.
  3. Vega86

    Newcomer

    Joined:
    Sep 25, 2018
    Messages:
    191
    Likes Received:
    131
    Have some new questions.

    Next gen consoles have about 13.5 GB for games. I assume that you can squeeze all data there using their new, respective ssd compression systems.

    How is this gonna work on PC?

    In Nvidia's graph, everything goes through the GPU. Does this mean Nvidia Amperes with sub 13.5 GB VRAM will have some issues?

    Does using DLSS make VRAM data smaller for PC because it's only really of lower resolution?

    Or are pc developers gonna have to split data, only some can go through Ampere's compression system, the rest, through typical pc i/o?
     
  4. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    20,517
    Likes Received:
    24,424
    Only what has to go to the GPU would go through the RTX IO. Program code will always go through CPU to main memory and be executed there. The PC can also use main memory as even faster subsystem than NVME.
     
    PSman1700 and Vega86 like this.
  5. Vega86

    Newcomer

    Joined:
    Sep 25, 2018
    Messages:
    191
    Likes Received:
    131
    Does that mean only consoles can do top to bottom data that's significantly compressed, for both CPU/GPU data while still performant?

    While on PC, some data are not meant to be significantly compressed for CPU and only some can be significantly compressed while performant, on the GPU?

    Would this lead to less optimized PC ports, worse than current gen?
     
  6. LordVulkan

    Newcomer

    Joined:
    Mar 31, 2015
    Messages:
    11
    Likes Received:
    25
    I don't know why you would want to compress any CPU data in systems with at least 16GB RAM, you would just want to fill as much RAM as possible at the games start.

    And there is no reason to think that NVIDIA doesn't have some big performant lossless decompressor(either HW or using compute shaders) when they are advertising a decompressed throughput of 14 Gbps.
     
    Vega86 and PSman1700 like this.
  7. LordVulkan

    Newcomer

    Joined:
    Mar 31, 2015
    Messages:
    11
    Likes Received:
    25
    Can't I edit messages in this forum? The first quote was unintentional and with "16GB de RAM" I meant "16GB RAM".

    Sorry for the mistake.
     
  8. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    20,517
    Likes Received:
    24,424
    No problems. It's forum anti-spam measures. After X number of posts or Y number of days users should be able to edit posts within Z minutes after posting. I don't recall the specifics.
     
  9. Infinisearch

    Veteran

    Joined:
    Jul 22, 2004
    Messages:
    779
    Likes Received:
    146
    Location:
    USA
    Yeah I know what you mean... why would fixed function vs programmable have an effect on the nature of memory management for the decompression algorithm? As in malloc type functionality would still be the same. So if a fixed function solution exists as long as the programmable was similar in regards to memory allocation whats the problem? As far as caches go I've never heard of special cache (something like unaligned access in a single line) for FF hardware.
     
    BRiT likes this.
  10. Vega86

    Newcomer

    Joined:
    Sep 25, 2018
    Messages:
    191
    Likes Received:
    131
    Would you happen to know what kinds of data are for RAM vs VRAM in terms of video games? I'm assuming textures go to VRAM but what else?
     
  11. Infinisearch

    Veteran

    Joined:
    Jul 22, 2004
    Messages:
    779
    Likes Received:
    146
    Location:
    USA
    Consoles or PC? Before DX12/Vulkan memory management for the GPU was handled by the drivers. I've heard sometimes they would put index buffer data into system RAM but vertex buffers went into VRAM. Of course things like render targets go into VRAM. Beyond that unless you use GPU compute for something pretty much everything else goes into system RAM. BTW just so you know in the case of a UMA like with the PS4 VRAM and system ram are the same/similar (potentially partitioned) but are potentially in different but sometimes 'related' address spaces.
    edit - different virtual memory translation
    units.
     
    #311 Infinisearch, Sep 3, 2020
    Last edited: Sep 3, 2020
    Vega86 and RagnarokFF like this.
  12. Dictator

    Regular

    Joined:
    Feb 11, 2011
    Messages:
    683
    Likes Received:
    3,975
    Source: Reddit Megathread on RTX 3000 launch (I would post it here, but the forum automatically interprets it as media and makes it fill the page)


    It looks like it will run with any NVMe type, just a matter of finding out what the motherboard requirements are. Also good to see the average/"typical" compression ratio is 2x.
     
  13. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Those techniques are crucial, agreed, and I would expect every dev to have tackled this problem in some way, up to the limits of the APIs they're working with.

    So far I've not seen a description of how NVidia's decompression system works. There are two basic models we can talk about:
    1. a "block" of data is loaded from storage into VRAM, then the decompressor reads the block and writes to a new block somewhere else in VRAM
    2. a "block" of data is streamed from storage through the decompressor and ends up as a block somewhere in VRAM
    2 appears to be the preferable model. In terms of VRAM fragmentation it would appear to be better. As I understand it, PS5 is using this latter model.

    1 combined with "too large" VRAM, e.g. 24GB :) would probably suffer rarely if ever with fragmentation (if the dev is paying attention).

    Regardless of model, the problem with textures is that it's tricky to predict how much VRAM is required at a given instant. The PS5 "textures load as the camera moves" model makes texture consumption of VRAM even more fiddlesome, since it encourages devs to massively overcommit textures: now, instead of loading all textures that could possibly be used within the next hour of stealth game play, say, the game is loading all textures needed for the next 60ms of game play.

    Also, in games where there are no loading screens, there's no "dedicated time" to perform "memory defragmentation".

    As a PC dev, how are you supposed to build a game where EVERY user is expected to have a PC powerful enough to support loading all required textures 10s of times per second? You're gonna install a "1337" version of the game for the 0.1% and use loadscreens and blurry textures for everyone else?
     
  14. LordVulkan

    Newcomer

    Joined:
    Mar 31, 2015
    Messages:
    11
    Likes Received:
    25
    Algorithms involved will scale with bandwidth, only PS5 first party studies can allow design an engine around a fixed storage bandwidth. And I doubt they will in the longterm as they seem to be expanding their games on PC and they want their technology to be ready for the next console iteration. I think they will only do it during PS5 first or two years in order to have a showcase for their console.

    I assume there are already some billiant minds looking for how to properly scale with bandwidth allowing the optimal results for any bandwidth and we will start to see their solutions in GDC 2022.

    Altough there are already some scalable technologies announced like Epic's Nanite.
     
  15. Infinisearch

    Veteran

    Joined:
    Jul 22, 2004
    Messages:
    779
    Likes Received:
    146
    Location:
    USA
    So overprovisioned pool/chunk of memory is out the question?
    edit - as in pool allocators and for anything that doesn't fit into that a power of two malloc that runs off pools.
     
    #315 Infinisearch, Sep 3, 2020
    Last edited: Sep 3, 2020
  16. Infinisearch

    Veteran

    Joined:
    Jul 22, 2004
    Messages:
    779
    Likes Received:
    146
    Location:
    USA
    Isn't this technology only for games where you can traverse parts of the level either by teleportation or rapid movement for an fairly extended period of time? I never got the impression they were trying to do away with 'VRAM'.
     
  17. iroboto

    iroboto Daft Funk
    Legend Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    14,834
    Likes Received:
    18,634
    Location:
    The North
    There is that and largely I think the goal is to allow for games to be designed Without the need of gates in your level design where a small QTE event occurs to unload parts of a level and load in the next parts of the level.

    And then there are other design paradigms where extremely detailed worlds and textures in an area caused the playing field to reduce immensely, and having that level of fidelity but in a vast area wasn’t possible due to memory limitations.

    and then everything else you sort of indicated
     
  18. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    No.

    I can imagine that PS5 style "continuously streaming textures" actually reduces the pressure on VRAM. With ultra-low latency texture streaming the problem becomes how much space is there on the disk for the game install, not VRAM.

    For PC games I doubt latencies will ever allow for PS5 style ultra-streaming game engines, because the lowest common denominator of a PC with 300GB/s max disk bandwidth is unavoidable.
     
  19. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,237
    Likes Received:
    4,260
    Location:
    Guess...
    NVMe drives could be made a minimum requirement I guess. Simply going from 4k textures to 2k textures would quarter your texture streaming requirements and allow you to scale from the top end NVMe's to the bottom end.

    Relative to consoles, decent amounts of system RAM should also alleviate a lot of pressure on the storage IO with good pre-caching.

    I wouldn't be that surprised to see requirements along the lines of "32GB RAM with SATA SSD or 16GB RAM with NVMe SSD"
     
    tinokun, Dictator, PSman1700 and 3 others like this.
  20. manux

    Veteran

    Joined:
    Sep 7, 2002
    Messages:
    3,034
    Likes Received:
    2,276
    Location:
    Self Imposed Exhile
    So few people have 32GB ram that it really becomes easier to get folks to buy entry level nvme ssd. Buy nvme ssd instead of memory upgrade. Funnily the laptop peasants are ahead of the ssd curve here.
     
    #320 manux, Sep 3, 2020
    Last edited: Sep 3, 2020
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...