Support for Machine Learning (ML) on PS5 and Series X?

Discussion in 'Console Technology' started by Shifty Geezer, Mar 18, 2020.

  1. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    10,870
    Likes Received:
    10,965
    Location:
    The North
    It’s the worst.
    There’s a whole business of patent trolling. My friend has had to represent MS on multiple occasions. Worse job ever. The prosecution just keeps asking the same question repeatedly in different ways until you answer differently and then they say, ah ha, you are patent infringing, or we are not patent infringing.

    it’s so dumb. The whole patent process needs to be redone.
     
    tinokun, Silent_Buddha, DSoup and 2 others like this.
  2. BRiT

    BRiT Verified (╯°□°)╯
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    16,020
    Likes Received:
    15,010
    Location:
    Cleveland
    That's already patented and any such use infringes.
     
    function, tinokun, Jay and 2 others like this.
  3. BRiT

    BRiT Verified (╯°□°)╯
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    16,020
    Likes Received:
    15,010
    Location:
    Cleveland
  4. Ronaldo8

    Newcomer

    Joined:
    May 18, 2020
    Messages:
    233
    Likes Received:
    232
    You were not far off. The patent "Reducing the search space space for real time texture compression" (Patent 1) is the complement of the patent "Machine learning applied to textures compression or upscaling.' (Patent 2).
    More specifically, Patent 1 states:

    • For example, the compressed textures 14 may be compressed using any one of a plurality of compression algorithms or schemes that are incompatible with the GPU, but which provide a substantially greater compression ratio as compared to a GPU-compatible compression algorithm/scheme (e.g., block compression). For example, hardware compatible compressed texture formats may be 0.5 or 1 bytes per texel, while machine learning can go as low as half of 0.5 or 1 bytes per texel. The compressed textures 14 may be compressed using machine learning image compression, JPEG, wavelet, general purpose lossless compression, and/or other forms of compression that result in a relatively high compression ratio. As noted, however, the compressed textures 14 may not be directly usable by the GPU. As such, the resulting hardware incompatible image generated by compressed textures 14 may be block compressed prior to usage by the GPU.
    • [0027]
      Application 10 may also include a metadata file 16 with one or more hints 20 (e.g., up to n hints, where n is an integer) as to which mode, shape and/or end points to choose for the textures 26 when block compressing the texture 26 for application 10 at runtime. For example, the hints 20 may indicate that while there may be 8 different modes for an identified region of a texture 26, this identified region of a texture 26 only uses modes 1 and 2. In addition or alternatively, for example, the hints 20 may indicate which shapes are common shapes for the identified region of a texture 26. The hints 20 may also provide information regarding a subset of the search space defined by a subset of one or more of the mode, the shape, or the endpoints to choose for compressing the decompressed textures into hardware compatible compressed textures that includes less than all of the potential combinations that make up the search space. For example, the subset may include one or more of the mode, the shape, or the endpoints, or one or more of a combination of the mode, the shape, or the endpoints. As such, the hints 20 may be used to reduce the search space when determining how to efficiently compress the textures 26 at runtime.


    Hence, Patent 1 describes a process whereby the offline compression engine (which may involve a machine learning model) will compress the textures of the application into a GPU incompatible format along with a metadata file that will provide hints to accelerate its conversion into block compressed GPU compatible texture by constraining the search space in determining which modes, shapes and endpoints will provide the best quality of GPU compatible blocks. Hence Patent 1 provides a neat explanation on how we end up with highly compressed non-gpu compatible textures on disk in the first place.

    Following hot on the heels of Patent 1, Patent 2 then describes how those GPU-incompatible textures can now be converted into regular block compressed textures at runtime. Quoting Patent 2:

    • For example, the trained machine learning model 18 may decompress the identified textures 17 into block compressed textures usable by the GPU by predicting the components of a blocked compressed texture (e.g., the modes, shapes, endpoints, and/or indices) for the identified textures 17 and/or a region of the texture 17. The predicted block compressed textures may select various modes, shapes, and/or endpoints to use during the block compression for the identified textures 17 and/or a region of the texture 17.
    • [0042]
      The machine learning networks may evaluate the image and visual quality of the predicted blocked compressed textures generated during the training process by comparing the predicted block compressed textures to the original source textures 17 used as input for the training. The machine learning networks may try to improve the predicted block compressed textures (e.g., modifying the selected modes, shapes, and/or endpoints) until there is a minimal difference between the predicted block compressed textures and the original source textures 17. When a minimal difference occurs between the predicted block compressed textures and the original source textures 17, it may be difficult to distinguish the predicted block compressed textures and the original source textures 17.
    • [0043]
      The selected modes, shapes and/or endpoints used when predicting the blocked compressed textures may be saved as metadata 19. Metadata 19 may provide guidance as to which modes, shapes, and/or end points may produce the best quality blocks. As such, metadata 19 may be used by the trained machine learning model 18 to create hardware compatible compressed textures 22 that closely resemble the original raw images of application 10. For example, metadata 19 may be used by the trained machine learning model 18 to assist in selecting correct endpoints, modes, and/or shapes of a block compressed texture when decompressing the hardware incompatible compressed textures 16 directly into block compressed number (BCN) textures.

    We now have a more or less complete picture of the texture management pipeline. During runtime conversion of gpu-incompatible texture, the machine learning model will search the space state of modes, shapes and endpoints to find the combination with the best match resulting in minimal loss of detail. The metadata actually helps in constraining the search space for this best fit and ensures that there is minimal loss of details provided that the learning model has been properly trained offline with relevant training data (existing textures of the application).

    The question is....is this actually BCPACK? My hunch is that BCPACK has nothing to do with this scheme and is just an analogue of crunch.
     
    #164 Ronaldo8, Jul 3, 2020
    Last edited: Jul 3, 2020
  5. dobwal

    Legend Veteran

    Joined:
    Oct 26, 2005
    Messages:
    5,436
    Likes Received:
    1,498
    The Xbox one had this capacity in hardware as one of the DMEs has jpeg decompression hardware. JPEG has been looked at for a long time as a distribution format for textures because the storage needs to accommodate textures has been growing in an exponential like manner for a while now. The problem is that most compression schemes like jpeg that offer great compression ratios at minimal loss in quality are incompatible with current gpu hardware.

    These patents seemed to be focused on improving the transcoding step by making making it more performant without the need of heavy engineering by devs.

    Provide hints in the form of metadata to the block compressor so it can be speedier. Use ML instead of handtuning block compression for each texture offline and rolling your own metadata to provide to the compressor.
     
    tinokun and iroboto like this.
  6. Jay

    Jay
    Veteran Regular

    Joined:
    Aug 3, 2013
    Messages:
    2,737
    Likes Received:
    1,844
    Forgot about the DME jpeg, did it actually get used and if not, why not?
    I don't remember hearing it being used for texture conversion.
    The other DME had standard zlib decompression from what I remember, easier to see the use.
     
  7. dobwal

    Legend Veteran

    Joined:
    Oct 26, 2005
    Messages:
    5,436
    Likes Received:
    1,498
    IDK. But that seems like a huge undertaken and a round about way to get to 6 GBps for BCPack.

    If 6 GBs refers to the rate of block compression of the hardware. It’s chewing through a ton of data. BCn usually offer compression rates about 4:1 to 8:1 (BC1 offering the best compression but the worst quality). The hardware is potentially and relatively chewing through 24-48 GBs of uncompressed data.

    Seems like a ton of work considering the alternative is to just supercompress offline with both lossy and lossless and just have the hardware perform lossless compression to VRAM.
     
  8. dobwal

    Legend Veteran

    Joined:
    Oct 26, 2005
    Messages:
    5,436
    Likes Received:
    1,498
    Rage used JPEG XR but I not aware what was used by the XB1.

    And the One offered both capability because not all data can use lossy compression and zlib wasn’t just limited to texture use.
     
  9. Ronaldo8

    Newcomer

    Joined:
    May 18, 2020
    Messages:
    233
    Likes Received:
    232
    ?.
    The decompression block is for zlib and BCPACK, not for any other format like JPEG. And James Stanard has strongly hinted that BCPACK bears some important homology with crunch. What is being discussed is the possibility of converting textures in GPU incompatible format into block compressed format using async compute at runtime through an adversarial ML model that will explore the subspace of endpoints/mode configuration on a trial and error basis and predict the expected output until the optimal fit is found.
     
  10. Arwin

    Arwin Now Officially a Top 10 Poster
    Moderator Legend

    Joined:
    May 17, 2006
    Messages:
    18,071
    Likes Received:
    1,664
    Location:
    Maastricht, The Netherlands
    In principle all machine learning we see to day is basically statistics and without exception the resulting algorithms can be exported as a small runnable program of at most 1-2MB and run by most hardware, including pretty weak smartphones. So you don’t really need machine learning optimized hardware for that.

    Will be interesting to see applications where the hardware is actually used during gameplay to learn something, as you need a lot of data. So in a sense it makes more sense to send data from all players to a single cloud based machine learning system and export runtimes from there.
     
    JPT likes this.
  11. Ronaldo8

    Newcomer

    Joined:
    May 18, 2020
    Messages:
    233
    Likes Received:
    232
    The X1, while horribly underpowered, foreshadowed many of the upcoming gen features with its MOVE engines and modified command processor.
     
  12. Ronaldo8

    Newcomer

    Joined:
    May 18, 2020
    Messages:
    233
    Likes Received:
    232
    While it is indeed true that low powered processors run AI inference all the time nowadays, they pale in comparison to the efficiency and gains in latency provided by GPUs configured for ML workloads especially compute intensive processes like image decompression/reconstruction.
    Hence....invest in tensor cores.
     
  13. dobwal

    Legend Veteran

    Joined:
    Oct 26, 2005
    Messages:
    5,436
    Likes Received:
    1,498
    What do you think crunch does?

    Texture are compressed into a gpu incompatible texture format that transcoded on the fly into a block compression format (dependent on gpu arch) that can be used by the gpu.

    The biggest difference is that the initial compression format is designed to allow for transcoding without the need of a recompression step.

    You can say that the patents’ intent is to accomplish what crunch does with an alternative method.

    We have absolutely no ideal how BCPack works other than the assumption that “BC” is related to BCn formats. Plus the patents don’t require JPEG, it’s just offered as one example of different gpu incompatible compression schemes that can be used.
     
    #173 dobwal, Jul 4, 2020
    Last edited: Jul 4, 2020
  14. Ronaldo8

    Newcomer

    Joined:
    May 18, 2020
    Messages:
    233
    Likes Received:
    232
    Maybe it's a misunderstanding on my part, but I don't follow? BCPACK, like Kraken and Crunch, is rumored to be the further compression of BCn formats. You decompressed BCPACK-compressed texture block through the hardaware decompressor and you end up with your texture in BCn format. Completely straightforward with no recompression step. What is implied in the patent is that you can directly transcode from a plurality of hardware incompatible formats into block-compressed textures using ML and hints contained in the metadata.
     
    #174 Ronaldo8, Jul 4, 2020
    Last edited: Jul 4, 2020
  15. Ronaldo8

    Newcomer

    Joined:
    May 18, 2020
    Messages:
    233
    Likes Received:
    232
    This should bypass the need for a specific hardware decompression/transcoding engine provided you have enough compute at runtime to dedicate to it (huge implication with regards to PC vs next-gen consoles). Also BCPACK was described as the further compression of BCn formats by James Stanard himself, which was what originally caught the attention of Richard Geldreich, the inventor of Crunch.
     
    #175 Ronaldo8, Jul 4, 2020
    Last edited: Jul 4, 2020
  16. dobwal

    Legend Veteran

    Joined:
    Oct 26, 2005
    Messages:
    5,436
    Likes Received:
    1,498
    Kraken is just a lossless compression scheme like zlib but offers slightly better compression and way better decoding speeds.

    Crunch involves lossy compression that incompatible with gpus but offer similar performance to jpeg. The format is transcoded on the fly to a BCn format into VRAM. There is a RDO-LZ mode that’s offered to provide easier integration. Crunch offers better compression than its RDO mode. It’s most promoted feature is that the initial compression format can be used across different gpu archs because it’s transcoder supports different texture block compression formats (BCn, ETC, PVRT, etc).

    We are assuming that the PS5 does something similar to RDO-LZ but with Kraken (RDO+Kraken). There is an assumption that the XSX offers something similar in BCPack.

    But ultimately that’s an assumption as we know little of BCPack, so when you asked if the patents were BCPack related. I simply stated “I don’t know”. The patents looks something like crunch with its transcoding mode except it doesn’t seem to avoid the recompression step but rather finds different ways to speed it up using a meta data file.

    Also, RDO+LZ or Kraken does looks more straight forward. That’s why I stated the patents seem to involves a lot a real-time hardware work to do what can be simply done offline.
     
    #176 dobwal, Jul 4, 2020
    Last edited: Jul 4, 2020
    tinokun likes this.
  17. Ronaldo8

    Newcomer

    Joined:
    May 18, 2020
    Messages:
    233
    Likes Received:
    232
    It's not a recompression step, its direct transcoding into a BCn format using trial and error (the operation can even time out). The search space for mode/endpoints configuration being very large, metadata is created during initial compression to constrict the search space.
     
  18. Ronaldo8

    Newcomer

    Joined:
    May 18, 2020
    Messages:
    233
    Likes Received:
    232
  19. dobwal

    Legend Veteran

    Joined:
    Oct 26, 2005
    Messages:
    5,436
    Likes Received:
    1,498
    Transcoding is often a two step process with the initial step being decoding to an intermediate uncompressed format. The second step is to encode into the desired format. Crunch seems to have the ability to avoid this step. The initial compression (CRN) format is designed to allow for conversion from the compressed format itself. Or at least that’s my impression from the writings of Geldreich, the creator of crunch.

    And my impressions from the patents is that the meta file isn’t generated from the initial gpu incompatible compression step. Rather a ML algorithm basically repeatedly BCn block compresses and decompresses the texture and figures out the best config to use at a given quality. A metadata file is generated and then the texture is compressed using something with similar performance to jpeg. The meta data is feed to the transcoder which helps speed up performance.
     
    #179 dobwal, Jul 4, 2020
    Last edited: Jul 4, 2020
    tinokun likes this.
  20. Ronaldo8

    Newcomer

    Joined:
    May 18, 2020
    Messages:
    233
    Likes Received:
    232
    Compression into gpu-incompatible format on disk is done offline and is available for run-time transcoding on the fly from disk. That's like the whole point of the patent. The interesting feature is that the transcoding engine can actually be an ML model being run by async compute which can be important for PC's lacking the hardware decompression block of the consoles but having instead dedicated tensor cores.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...