AI Desktop Tools - Training and Inferencing

A LLM inside a linux docker container on my Windows pc should be considered remote for Warp?

It seems to use OpenAI and send all requests directly to OpenAI, so it's currently remote only.
It'd be nice if it can use a local LLM such as Mistral's AI model in the future, at least as an option.
Docker Hub is the world’s largest repository for container images with an extensive collection of AI/ML development-focused container images, including leading frameworks and tools such as PyTorch, TensorFlow, Langchain, Hugging Face, and Ollama. With more than 100 million pull requests for AI/ML-related images, Docker Hub’s significance to the developer community is self-evident. It not only simplifies the development of AI/ML applications but also democratizes innovation, making AI technologies accessible to developers across the globe.

NVIDIA’s Docker Hub library offers a suite of container images that harness the power of accelerated computing, supplementing NVIDIA’s API catalog. Docker Hub’s vast audience — which includes approximately 27 million monthly active IPs, showcasing an impressive 47% year-over-year growth — can use these container images to enhance AI performance.

Docker Desktop on Windows and Mac helps deliver NVIDIA AI Workbench developers a smooth experience on local and remote machines.

NVIDIA AI Workbench is an easy-to-use toolkit (free) that allows developers to create, test, and customize AI and machine learning models on their PC or workstation and scale them to the data center or public cloud. It simplifies interactive development workflows while automating technical tasks that halt beginners and derail experts. AI Workbench makes workstation setup and configuration fast and easy. Example projects are also included to help developers get started even faster with their own data and use cases.

Docker engineering teams are collaborating with NVIDIA to improve the user experience with NVIDIA GPU-accelerated platforms through recent improvements to the AI Workbench installation on WSL2.

Check out how NVIDIA AI Workbench can be used locally to tune a generative image model to produce more accurate prompted results.

Last edited:
LangChain is pretty neat for using a local LLM to do local things. As I slowly suck less at this stuff, I'm now experimenting with using LangChain to permit my local Mistral LLM to take actions within my local HomeAssistant environment. Being able to ask the AI to turn lights off and on, or fans, or TVs, or even to adjust the thermostat temperature, or to ask where our Tesla is (finally got that working, although it's still a little wonky with the car in motion.)

Being able to host a voice AI like Alexa or Google Assistant without any internet connection requirement is pretty swanky.
NVIDIA AI Workbench, a toolkit for AI and ML developers, is now generally available as a free download. It features automation that removes roadblocks for novice developers and makes experts more productive.

Developers can experience a fast and reliable GPU environment setup and the freedom to work, manage, and collaborate across heterogeneous platforms regardless of skill level. Enterprise support is also available for customers who purchase a license for NVIDIA AI Enterprise.


Key AI Workbench features include:

  • Fast installation, setup, and configuration for GPU-based development environments.
  • Pre-built, ready-to-go generative AI, and ML example projects based on the latest models.
  • Deploy generative AI models with cloud endpoints from the NVIDIA API catalog or locally with NVIDIA NIM microservices.
  • An intuitive UX plus command line interface (CLI).
  • Easy reproducibility and portability across development environments.
  • Automation for Git and container-based developer environments.
  • Version control and management for containers and Git repositories.
  • Integrations with GitHub, GitLab, and the NVIDIA NGC catalog.
  • Transparent handling of credentials, secrets, and file system changes.
Since its Beta release, AI Workbench also has several new key features:

  • Visual Studio (VS) Code support: Directly integrated with VS Code to orchestrate containerized projects on GPU environments.
  • Choice of base images: Users can choose their own container image as the project base image when creating projects. The container image must use image labels that follow the base image specifications.
  • Improved package management: Users can manage and add packages directly to containers through the Workbench user interface.
  • Installation improvements: Users have an easier install path on Windows and MacOS. There is also improved support for the Docker container runtime.
Last edited:
March 27, 2024
Editor’s note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and which showcases new hardware, software, tools and accelerations for RTX PC users.

Now, the TensorRT extension for the popular Stable Diffusion WebUI by Automatic1111 is adding support for ControlNets, tools that give users more control to refine generative outputs by adding other images as guidance.

Plus, the TensorRT extension for Stable Diffusion WebUI boosts performance by up to 2x — significantly streamlining Stable Diffusion workflows.

With the extension’s latest update, TensorRT optimizations extend to ControlNets — a set of AI models that help guide a diffusion model’s output by adding extra conditions. With TensorRT, ControlNets are 40% faster.

Users can guide aspects of the output to match an input image, which gives them more control over the final image. They can also use multiple ControlNets together for even greater control. A ControlNet can be a depth map, edge map, normal map or keypoint detection model, among others.

If you're an early adopter of ChatRTX, you should probably update to the latest March 2024 build. The UI contained a couple of 'Medium' and 'High' severity security vulnerabilities. According to the security bulletin, the more dangerous of the two (given an 8.2 rating) lets potential attackers gain access to system files. This exploit could lead to an "escalation of privileges, information disclosure, and data tampering."

The second security vulnerability, rated 6.5) doesn't sound much better. The exploit allows attackers to run "malicious scripts in users' browsers," which can cause denial of service, information disclosure, and even code execution.

The good news is that the latest version of ChatRTX with the new security updates is available to download via NVIDIA credits those who pointed out these exploits in its update, and there's no evidence of them being used to date.

Google Making Major Changes in AI Operations to Pull in Cash from Gemini​

April 4, 2024
Google has also put a price on using its Gemini API and cut off most of its free access to its APIs. The message is clear: the party is over for developers looking for AI freebies, and Google wants to make money off AI tools such as Gemini.

Google provided developers free access to its older and newer APIs to its LLMs. Free access was an attempt to woo developers to adopt its AI products.
Google is attracting developers via its cloud service and AI Studio service. For now, developers can get free API keys on Google’s website, which provides access to Google’s LLMs through a chatbot interface. Developers and users have until now enjoyed free access to Google’s LLMs, but that is also ending.

This week, Google threw a double whammy that effectively shuts down free access to its APIs via AI studio.
Google also announced this week that it is restricting API access to its Google Gemini model in a bid to turn free users into paid customers. Free access to Gemini allowed many companies to offer chatbots based on the LLM for free, but Google’s changes will likely mean many of those chatbots will shut down.

“Pay-as-you-go pricing for the Gemini API will be introduced,” Google said in an email on Monday to developers.

The free plan includes two requests per minute, 32,000 tokens per minute, and a maximum of 50 requests per day. However, one drawback is that Google will use chatbot responses to improve its products, which purportedly include its LLMs.
The hundreds of billions spent in data centers to run AI is a gamble, as the companies do not have proven AI revenue models. As the use of the LLMs grows, small revenue streams through offerings like APIs could contribute to the cost of building the hardware and data centers.

Bloomberg recently reported that Amazon was spending $150 billion over 15 years to establish new data centers.

OpenAI and Microsoft plan to spend $100 billion on a supercomputer called Stargate, according to The Information.

For customers unwilling to pay, Google has released the Gemma large language models, around which customers can build their own AI applications. Other open-source models, such as Mistral, are also gaining in popularity.

Customers are leaning toward open-source LLMs as the cost of AI grows. These models can be downloaded and run on custom hardware that is tuned to run the applications, but most can’t afford the hardware, which in most cases is Nvidia’s GPUs. AI hardware is also not easily available off the shelf.

One positive is Chat with RTX currently provides access to two open-source AI models including Mistral and Llama. Access to more free models are planned in the future.
Last edited: