Thanks, I was getting tired of dismissive posts that said “patents aren’t products.” You mixed it up on this one, so good on you.
well, sorry man. This wasn't meant to be dismissive of Sony. Everyone has the same problem.
Unless Nvidia wants to start handing out models, everyone is stuck unless they want to build their own. Which is by all means can be very expensive.
To put things into perspective, Google, Amazon, and MS are the largest cloud processing for AI. None of them have a DLSS model. Facebook is trying but has something inferior to Nvidia as I understand it. Even with using RTX AI hardware, it's magnitudes away from DLSS performance.
MS can tout ML capabilities on the console, but no model, it's pointless. The technology for AI is in the model, the hardware to run it is trivial.
Further explanation on this front: a trained model consists of data and processing and the network. Even if you have the neural network to train with, and lets say it was open source, you still need data and then you need processing. Power.
To put things into perspective, BERT is a transformer network whose job is for natural language processing. It can read sentences and understand context as it reads both forwards and backwards to understand context. BERT the network is open source. The Data is not. The data source is Wikipedia (the whole wikipedia is read into BERT for training) but you'd still have to process the data ahead of time for it to be placed for training. Assuming you had a setup capable of training so much data, then gets into the compute part of the equation. Simply put, only a handful of companies in this world can train a proper BERT model. So while there are all sorts of white papers on BERT, small teams can't verify the results or keep up because the compute requirements are so high.
For a single training:
How long does it take to pre-train BERT?
BERT-base was trained on 4 cloud TPUs for 4 days and BERT-large was trained on 16 TPUs for 4 days. There is a recent paper that talks about bringing down BERT pre-training time –
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes.
If you make any change to it, any change to the network or data set. That's another 4 days of training before you can see the result. Iteration time is very slow without more horsepower on these complex networks.
***
Google
BERT — estimated total
training cost: US$6,912
Released last year by Google Research,
BERT is a bidirectional transformer model that redefined the state of the art for 11 natural language processing tasks.
From the Google research paper: “training of BERT – Large was performed on 16 Cloud TPUs (64 TPU chips total). Each pretraining took 4 days to complete.” Assuming the training device was Cloud TPU v2, the total price of one-time pretraining should be 16 (devices) * 4 (days) * 24 (hours) * 4.5 (US$ per hour) = US$6,912. Google suggests researchers with tight budgets could pretrain a smaller BERT-Base model on a single
preemptible Cloud TPU v2, which takes about two weeks with a cost of about US$500.
...
What may surprise many is the staggering cost of training an XLNet model.
A recent tweet from Elliot Turner — the serial entrepreneur and AI expert who is now the CEO and Co-Founder of Hologram AI — has prompted heated discussion on social media.
Turner wrote “it costs $245,000 to train the XLNet model (the one that’s beating BERT on NLP tasks).” His calculation is based on a resource breakdown provided in the paper: “We train XLNet-Large on 512 TPU v3 chips for 500K steps with an Adam optimizer, linear learning rate decay and a batch size of 2048, which takes about 2.5 days.”
***
None of these costs account for the amount of the R&D of how many times they had to run training just to get the result they wanted. The labour and education required from the researchers. The above is the cost of running the hardware.
Nvidia has been in the business since the beginning sucking up a ton of AI researcher talent. They have the hardware and resources and subject matter expertise in a long legacy of graphics to make it happen. It's understandable how they were able to create the models for DLSS.
I frankly can't see anyone else being able to pull this off. Not nearly as effectively. At least not anytime soon.