Support for Machine Learning (ML) on PS5 and Series X?

iroboto · Jul 3, 2020

Shifty Geezer said:
Imagine trying to be a judge and decide what patents are and aren't infringed! If it were me, I'd disqualify all patents for being written in gobbledegook! If its not obvious what you're patenting, you aren't really patenting anything.

Patent culture has completely corrupted the idea into opportunistic obfuscation.

Maybe ML can be purposed to translate patent speak into real language?

It’s the worst.
There’s a whole business of patent trolling. My friend has had to represent MS on multiple occasions. Worse job ever. The prosecution just keeps asking the same question repeatedly in different ways until you answer differently and then they say, ah ha, you are patent infringing, or we are not patent infringing.

it’s so dumb. The whole patent process needs to be redone.

BRiT · Jul 3, 2020

Shifty Geezer said:
Maybe ML can be purposed to translate patent speak into real language?

That's already patented and any such use infringes.

BRiT · Jul 3, 2020

Discussion Thread on Neural SuperSampling: https://forum.beyond3d.com/threads/neural-supersampling-facebook-researchers-2020.61862/

Ronaldo8 · Jul 3, 2020

iroboto said:
hmm indeed.

this is why I hate patent diving. There are so many, and so many related ones.
This patent was linked for instance:
http://www.freepatentsonline.com/y2019/0304138.html

and I thought it was the same as this patent here:
https://patentscope.wipo.int/search....wapp2nA?docId=US253950223&tab=PCTDESCRIPTION

and I started writing about the second one, in reference to the first one. Looked like a bumbling idiot. So yea, I get where you're going. I'm not going to read either, this stuff is mentally exhausting.

You were not far off. The patent "Reducing the search space space for real time texture compression" (Patent 1) is the complement of the patent "Machine learning applied to textures compression or upscaling.' (Patent 2).
More specifically, Patent 1 states:

For example, the compressed textures 14 may be compressed using any one of a plurality of compression algorithms or schemes that are incompatible with the GPU, but which provide a substantially greater compression ratio as compared to a GPU-compatible compression algorithm/scheme (e.g., block compression). For example, hardware compatible compressed texture formats may be 0.5 or 1 bytes per texel, while machine learning can go as low as half of 0.5 or 1 bytes per texel. The compressed textures 14 may be compressed using machine learning image compression, JPEG, wavelet, general purpose lossless compression, and/or other forms of compression that result in a relatively high compression ratio. As noted, however, the compressed textures 14 may not be directly usable by the GPU. As such, the resulting hardware incompatible image generated by compressed textures 14 may be block compressed prior to usage by the GPU.
[0027]
Application 10 may also include a metadata file 16 with one or more hints 20 (e.g., up to n hints, where n is an integer) as to which mode, shape and/or end points to choose for the textures 26 when block compressing the texture 26 for application 10 at runtime. For example, the hints 20 may indicate that while there may be 8 different modes for an identified region of a texture 26, this identified region of a texture 26 only uses modes 1 and 2. In addition or alternatively, for example, the hints 20 may indicate which shapes are common shapes for the identified region of a texture 26. The hints 20 may also provide information regarding a subset of the search space defined by a subset of one or more of the mode, the shape, or the endpoints to choose for compressing the decompressed textures into hardware compatible compressed textures that includes less than all of the potential combinations that make up the search space. For example, the subset may include one or more of the mode, the shape, or the endpoints, or one or more of a combination of the mode, the shape, or the endpoints. As such, the hints 20 may be used to reduce the search space when determining how to efficiently compress the textures 26 at runtime.

Hence, Patent 1 describes a process whereby the offline compression engine (which may involve a machine learning model) will compress the textures of the application into a GPU incompatible format along with a metadata file that will provide hints to accelerate its conversion into block compressed GPU compatible texture by constraining the search space in determining which modes, shapes and endpoints will provide the best quality of GPU compatible blocks. Hence Patent 1 provides a neat explanation on how we end up with highly compressed non-gpu compatible textures on disk in the first place.

Following hot on the heels of Patent 1, Patent 2 then describes how those GPU-incompatible textures can now be converted into regular block compressed textures at runtime. Quoting Patent 2:

For example, the trained machine learning model 18 may decompress the identified textures 17 into block compressed textures usable by the GPU by predicting the components of a blocked compressed texture (e.g., the modes, shapes, endpoints, and/or indices) for the identified textures 17 and/or a region of the texture 17. The predicted block compressed textures may select various modes, shapes, and/or endpoints to use during the block compression for the identified textures 17 and/or a region of the texture 17.
[0042]
The machine learning networks may evaluate the image and visual quality of the predicted blocked compressed textures generated during the training process by comparing the predicted block compressed textures to the original source textures 17 used as input for the training. The machine learning networks may try to improve the predicted block compressed textures (e.g., modifying the selected modes, shapes, and/or endpoints) until there is a minimal difference between the predicted block compressed textures and the original source textures 17. When a minimal difference occurs between the predicted block compressed textures and the original source textures 17, it may be difficult to distinguish the predicted block compressed textures and the original source textures 17.
[0043]
The selected modes, shapes and/or endpoints used when predicting the blocked compressed textures may be saved as metadata 19. Metadata 19 may provide guidance as to which modes, shapes, and/or end points may produce the best quality blocks. As such, metadata 19 may be used by the trained machine learning model 18 to create hardware compatible compressed textures 22 that closely resemble the original raw images of application 10. For example, metadata 19 may be used by the trained machine learning model 18 to assist in selecting correct endpoints, modes, and/or shapes of a block compressed texture when decompressing the hardware incompatible compressed textures 16 directly into block compressed number (BCN) textures.

We now have a more or less complete picture of the texture management pipeline. During runtime conversion of gpu-incompatible texture, the machine learning model will search the space state of modes, shapes and endpoints to find the combination with the best match resulting in minimal loss of detail. The metadata actually helps in constraining the search space for this best fit and ensures that there is minimal loss of details provided that the learning model has been properly trained offline with relevant training data (existing textures of the application).

The question is....is this actually BCPACK? My hunch is that BCPACK has nothing to do with this scheme and is just an analogue of crunch.

dobwal · Jul 4, 2020

Jay said:
I can't remember the details either, too be fair.
But the important bit is from a non gpu texture format to a supported one.
I'm talking about the download package, install size, runtime and when the inferencing could be done.

The Xbox one had this capacity in hardware as one of the DMEs has jpeg decompression hardware. JPEG has been looked at for a long time as a distribution format for textures because the storage needs to accommodate textures has been growing in an exponential like manner for a while now. The problem is that most compression schemes like jpeg that offer great compression ratios at minimal loss in quality are incompatible with current gpu hardware.

These patents seemed to be focused on improving the transcoding step by making making it more performant without the need of heavy engineering by devs.

Provide hints in the form of metadata to the block compressor so it can be speedier. Use ML instead of handtuning block compression for each texture offline and rolling your own metadata to provide to the compressor.

Jay · Jul 4, 2020

dobwal said:
The Xbox one had this capacity in hardware as one of the DMEs has jpeg decompression hardware. JPEG has been looked at for a long time as a distribution format for textures because the storage needs to accommodate textures has been growing in an exponential like manner for a while now.

Forgot about the DME jpeg, did it actually get used and if not, why not?
I don't remember hearing it being used for texture conversion.
The other DME had standard zlib decompression from what I remember, easier to see the use.

dobwal · Jul 4, 2020

Ronaldo8 said:
You were not far off. The patent "Reducing the search space space for real time texture compression" (Patent 1) is the complement of the patent "Machine learning applied to textures compression or upscaling.' (Patent 2).
More specifically, Patent 1 states:

For example, the compressed textures 14 may be compressed using any one of a plurality of compression algorithms or schemes that are incompatible with the GPU, but which provide a substantially greater compression ratio as compared to a GPU-compatible compression algorithm/scheme (e.g., block compression). For example, hardware compatible compressed texture formats may be 0.5 or 1 bytes per texel, while machine learning can go as low as half of 0.5 or 1 bytes per texel. The compressed textures 14 may be compressed using machine learning image compression, JPEG, wavelet, general purpose lossless compression, and/or other forms of compression that result in a relatively high compression ratio. As noted, however, the compressed textures 14 may not be directly usable by the GPU. As such, the resulting hardware incompatible image generated by compressed textures 14 may be block compressed prior to usage by the GPU.

[0027]
Application 10 may also include a metadata file 16 with one or more hints 20 (e.g., up to n hints, where n is an integer) as to which mode, shape and/or end points to choose for the textures 26 when block compressing the texture 26 for application 10 at runtime. For example, the hints 20 may indicate that while there may be 8 different modes for an identified region of a texture 26, this identified region of a texture 26 only uses modes 1 and 2. In addition or alternatively, for example, the hints 20 may indicate which shapes are common shapes for the identified region of a texture 26. The hints 20 may also provide information regarding a subset of the search space defined by a subset of one or more of the mode, the shape, or the endpoints to choose for compressing the decompressed textures into hardware compatible compressed textures that includes less than all of the potential combinations that make up the search space. For example, the subset may include one or more of the mode, the shape, or the endpoints, or one or more of a combination of the mode, the shape, or the endpoints. As such, the hints 20 may be used to reduce the search space when determining how to efficiently compress the textures 26 at runtime.

Hence, Patent 1 describes a process whereby the offline compression engine (which may involve a machine learning model) will compress the textures of the application into a GPU incompatible format along with a metadata file that will provide hints to accelerate its conversion into block compressed GPU compatible texture by constraining the search space in determining which modes, shapes and endpoints will provide the best quality of GPU compatible blocks. Hence Patent 1 provides a neat explanation on how we end up with highly compressed non-gpu compatible textures on disk in the first place.

Following hot on the heels of Patent 1, Patent 2 then describes how those GPU-incompatible textures can now be converted into regular block compressed textures at runtime. Quoting Patent 2:

For example, the trained machine learning model 18 may decompress the identified textures 17 into block compressed textures usable by the GPU by predicting the components of a blocked compressed texture (e.g., the modes, shapes, endpoints, and/or indices) for the identified textures 17 and/or a region of the texture 17. The predicted block compressed textures may select various modes, shapes, and/or endpoints to use during the block compression for the identified textures 17 and/or a region of the texture 17.

[0042]
The machine learning networks may evaluate the image and visual quality of the predicted blocked compressed textures generated during the training process by comparing the predicted block compressed textures to the original source textures 17 used as input for the training. The machine learning networks may try to improve the predicted block compressed textures (e.g., modifying the selected modes, shapes, and/or endpoints) until there is a minimal difference between the predicted block compressed textures and the original source textures 17. When a minimal difference occurs between the predicted block compressed textures and the original source textures 17, it may be difficult to distinguish the predicted block compressed textures and the original source textures 17.

[0043]
The selected modes, shapes and/or endpoints used when predicting the blocked compressed textures may be saved as metadata 19. Metadata 19 may provide guidance as to which modes, shapes, and/or end points may produce the best quality blocks. As such, metadata 19 may be used by the trained machine learning model 18 to create hardware compatible compressed textures 22 that closely resemble the original raw images of application 10. For example, metadata 19 may be used by the trained machine learning model 18 to assist in selecting correct endpoints, modes, and/or shapes of a block compressed texture when decompressing the hardware incompatible compressed textures 16 directly into block compressed number (BCN) textures.

We now have a more or less complete picture of the texture management pipeline. During runtime conversion of gpu-incompatible texture, the machine learning model will search the space state of modes, shapes and endpoints to find the combination with the best match resulting in minimal loss of detail. The metadata actually helps in constraining the search space for this best fit and ensures that there is minimal loss of details provided that the learning model has been properly trained offline with relevant training data (existing textures of the application).

The question is....is this actually BCPACK? My hunch is that BCPACK has nothing to do with this scheme and is just an analogue of crunch.

IDK. But that seems like a huge undertaken and a round about way to get to 6 GBps for BCPack.

If 6 GBs refers to the rate of block compression of the hardware. It’s chewing through a ton of data. BCn usually offer compression rates about 4:1 to 8:1 (BC1 offering the best compression but the worst quality). The hardware is potentially and relatively chewing through 24-48 GBs of uncompressed data.

Seems like a ton of work considering the alternative is to just supercompress offline with both lossy and lossless and just have the hardware perform lossless compression to VRAM.

dobwal · Jul 4, 2020

Jay said:
Forgot about the DME jpeg, did it actually get used and if not, why not?
I don't remember hearing it being used for texture conversion.
The other DME had standard zlib decompression from what I remember, easier to see the use.

Rage used JPEG XR but I not aware what was used by the XB1.

And the One offered both capability because not all data can use lossy compression and zlib wasn’t just limited to texture use.

Ronaldo8 · Jul 4, 2020

dobwal said:
IDK. But that seems like a huge undertaken and a round about way to get to 6 GBps for BCPack.

If 6 GBs refers to the rate of block compression of the hardware. It’s chewing through a ton of data. BCn usually offer compression rates about 4:1 to 8:1 (BC1 offering the best compression but the worst quality). The hardware is potentially and relatively chewing through 24-48 GBs of uncompressed data.

Seems like a ton of work considering the alternative is to just supercompress offline with both lossy and lossless and just have the hardware perform lossless compression to VRAM.

?.
The decompression block is for zlib and BCPACK, not for any other format like JPEG. And James Stanard has strongly hinted that BCPACK bears some important homology with crunch. What is being discussed is the possibility of converting textures in GPU incompatible format into block compressed format using async compute at runtime through an adversarial ML model that will explore the subspace of endpoints/mode configuration on a trial and error basis and predict the expected output until the optimal fit is found.

Arwin · Jul 4, 2020

In principle all machine learning we see to day is basically statistics and without exception the resulting algorithms can be exported as a small runnable program of at most 1-2MB and run by most hardware, including pretty weak smartphones. So you don’t really need machine learning optimized hardware for that.

Will be interesting to see applications where the hardware is actually used during gameplay to learn something, as you need a lot of data. So in a sense it makes more sense to send data from all players to a single cloud based machine learning system and export runtimes from there.

Ronaldo8 · Jul 4, 2020

dobwal said:
Rage used JPEG XR but I not aware what was used by the XB1.

And the One offered both capability because not all data can use lossy compression and zlib wasn’t just limited to texture use.

The X1, while horribly underpowered, foreshadowed many of the upcoming gen features with its MOVE engines and modified command processor.

Ronaldo8 · Jul 4, 2020

Arwin said:
In principle all machine learning we see to day is basically statistics and without exception the resulting algorithms can be exported as a small runnable program of at most 1-2MB and run by most hardware, including pretty weak smartphones. So you don’t really need machine learning optimized hardware for that.

Will be interesting to see applications where the hardware is actually used during gameplay to learn something, as you need a lot of data. So in a sense it makes more sense to send data from all players to a single cloud based machine learning system and export runtimes from there.

While it is indeed true that low powered processors run AI inference all the time nowadays, they pale in comparison to the efficiency and gains in latency provided by GPUs configured for ML workloads especially compute intensive processes like image decompression/reconstruction.
Hence....invest in tensor cores.

dobwal · Jul 4, 2020

Ronaldo8 said:
?.
The decompression block is for zlib and BCPACK, not for any other format like JPEG. And James Stanard has strongly hinted that BCPACK bears some important homology with crunch. What is being discussed is the possibility of converting textures in GPU incompatible format into block compressed format using async compute at runtime through an adversarial ML model that will explore the subspace of endpoints/mode configuration on a trial and error basis and predict the expected output until the optimal fit is found.

What do you think crunch does?

Texture are compressed into a gpu incompatible texture format that transcoded on the fly into a block compression format (dependent on gpu arch) that can be used by the gpu.

The biggest difference is that the initial compression format is designed to allow for transcoding without the need of a recompression step.

You can say that the patents’ intent is to accomplish what crunch does with an alternative method.

We have absolutely no ideal how BCPack works other than the assumption that “BC” is related to BCn formats. Plus the patents don’t require JPEG, it’s just offered as one example of different gpu incompatible compression schemes that can be used.

Ronaldo8 · Jul 4, 2020

dobwal said:
What do you think crunch does?

Texture are compressed into a gpu incompatible texture format that transcoded on the fly into a block compression format (dependent on gpu arch) that can be used by the gpu.

The biggest difference is that the initial compression format is designed to allow for transcoding without the need of a recompression step.

You can say that the patents’ intent is to accomplish what crunch does with an alternative method.

We have absolutely no ideal how BCPack works other than the assumption that “BC” is related to BCn formats. Plus the patents don’t require JPEG, it’s just offered as one example of different gpu incompatible compression schemes that can be used.

Maybe it's a misunderstanding on my part, but I don't follow? BCPACK, like Kraken and Crunch, is rumored to be the further compression of BCn formats. You decompressed BCPACK-compressed texture block through the hardaware decompressor and you end up with your texture in BCn format. Completely straightforward with no recompression step. What is implied in the patent is that you can directly transcode from a plurality of hardware incompatible formats into block-compressed textures using ML and hints contained in the metadata.

Ronaldo8 · Jul 4, 2020

Ronaldo8 said:
Maybe it's a misunderstanding on my part, but I don't follow? BCPACK, like Kraken and Crunch, is rumored to be the further compression of BCn formats. You decompressed BCPACK-compressed texture block through the hardaware decompressor and you end up with your texture in BCn format. Completely straightforward with no recompression step. What is implied in the patent is that you can directly transcode from a plurality of hardware incompatible formats into block-compressed textures using ML and hints contained in the metadata.

This should bypass the need for a specific hardware decompression/transcoding engine provided you have enough compute at runtime to dedicate to it (huge implication with regards to PC vs next-gen consoles). Also BCPACK was described as the further compression of BCn formats by James Stanard himself, which was what originally caught the attention of Richard Geldreich, the inventor of Crunch.

dobwal · Jul 4, 2020

Ronaldo8 said:
Maybe it's a misunderstanding on my part, but I don't follow? BCPACK, like Kraken and Crunch, is rumored to be the further compression of BCn formats. You decompressed BCPACK-compressed texture block through the hardaware decompressor and you end up with your texture in BCn format. Completely straightforward with no transcoding/recompression step. What is implied in the patent is that you can directly transcode from a plurality of hardware incompatible formats into block-compressed textures using ML and hints contained in the metadata.

Kraken is just a lossless compression scheme like zlib but offers slightly better compression and way better decoding speeds.

Crunch involves lossy compression that incompatible with gpus but offer similar performance to jpeg. The format is transcoded on the fly to a BCn format into VRAM. There is a RDO-LZ mode that’s offered to provide easier integration. Crunch offers better compression than its RDO mode. It’s most promoted feature is that the initial compression format can be used across different gpu archs because it’s transcoder supports different texture block compression formats (BCn, ETC, PVRT, etc).

We are assuming that the PS5 does something similar to RDO-LZ but with Kraken (RDO+Kraken). There is an assumption that the XSX offers something similar in BCPack.

But ultimately that’s an assumption as we know little of BCPack, so when you asked if the patents were BCPack related. I simply stated “I don’t know”. The patents looks something like crunch with its transcoding mode except it doesn’t seem to avoid the recompression step but rather finds different ways to speed it up using a meta data file.

Also, RDO+LZ or Kraken does looks more straight forward. That’s why I stated the patents seem to involves a lot a real-time hardware work to do what can be simply done offline.

Ronaldo8 · Jul 4, 2020

Ronaldo8 said:
This should bypass the need for a specific hardware decompression/transcoding engine provided you have enough compute at runtime to dedicate to it.

dobwal said:
Kraken is just a lossless compression scheme like zlib but offer slightly better compression and way better decoding speeds.

Crunch involves lossy compression that incompatible with gpus but offer similar performance to jpeg. The format is transcoded on the fly to a BCn format into VRAM. There is a RDO-LZ mode that’s offered to provide easier integration.

We are assuming that the PS5 does something similar to RDO-LZ but with Kraken (RDO+Kraken). There is an assumption that the XSX offers something similar in BCPack.

But ultimately that’s an assumption as we know little of BCPack, so when you asked if the patents were BCPack related. I simply stated “I don’t know”. The patents looks something like crunch with its transcoding mode except it doesn’t seem to avoid the recompression step but rather finds a way to speed it up using a meta data file.

It's not a recompression step, its direct transcoding into a BCn format using trial and error (the operation can even time out). The search space for mode/endpoints configuration being very large, metadata is created during initial compression to constrict the search space.

Ronaldo8 · Jul 4, 2020

https://twitter.com/x/status/1241076025477357568

dobwal · Jul 4, 2020

Ronaldo8 said:
It's not a recompression step, its direct transcoding into a BCn format using trial and error (the operation can even time out). The search space for mode/endpoints configuration being very large, metadata is created during initial compression to constrict the search space.

Transcoding is often a two step process with the initial step being decoding to an intermediate uncompressed format. The second step is to encode into the desired format. Crunch seems to have the ability to avoid this step. The initial compression (CRN) format is designed to allow for conversion from the compressed format itself. Or at least that’s my impression from the writings of Geldreich, the creator of crunch.

And my impressions from the patents is that the meta file isn’t generated from the initial gpu incompatible compression step. Rather a ML algorithm basically repeatedly BCn block compresses and decompresses the texture and figures out the best config to use at a given quality. A metadata file is generated and then the texture is compressed using something with similar performance to jpeg. The meta data is feed to the transcoder which helps speed up performance.

Ronaldo8 · Jul 4, 2020

dobwal said:
Kraken is just a lossless compression scheme like zlib but offers slightly better compression and way better decoding speeds.

Crunch involves lossy compression that incompatible with gpus but offer similar performance to jpeg. The format is transcoded on the fly to a BCn format into VRAM. There is a RDO-LZ mode that’s offered to provide easier integration. Crunch offers better compression than its RDO mode. It’s most promoted feature is that the initial compression format can be used across different gpu archs because it’s transcoder supports different texture block compression formats (BCn, ETC, PVRT, etc).

We are assuming that the PS5 does something similar to RDO-LZ but with Kraken (RDO+Kraken). There is an assumption that the XSX offers something similar in BCPack.

But ultimately that’s an assumption as we know little of BCPack, so when you asked if the patents were BCPack related. I simply stated “I don’t know”. The patents looks something like crunch with its transcoding mode except it doesn’t seem to avoid the recompression step but rather finds different ways to speed it up using a meta data file.

Also, RDO+LZ or Kraken does looks more straight forward. That’s why I stated the patents seem to involves a lot a real-time hardware work to do what can be simply done offline.

Compression into gpu-incompatible format on disk is done offline and is available for run-time transcoding on the fly from disk. That's like the whole point of the patent. The interesting feature is that the transcoding engine can actually be an ML model being run by async compute which can be important for PC's lacking the hardware decompression block of the consoles but having instead dedicated tensor cores.

Support for Machine Learning (ML) on PS5 and Series X?

iroboto

Daft Funk

BRiT

(>• •)>⌐■-■ (⌐■-■)

BRiT

(>• •)>⌐■-■ (⌐■-■)

Ronaldo8

dobwal

Jay

dobwal

dobwal

Ronaldo8

Arwin

Now Officially a Top 10 Poster

Ronaldo8

Ronaldo8

dobwal

Ronaldo8

Ronaldo8

dobwal

Ronaldo8

Ronaldo8

dobwal

Ronaldo8

Similar threads