Skip to main content

Microsoft and Nvidia are making it easier to run AI models on Windows

Microsoft and Nvidia are making it easier to run AI models on Windows


Microsoft’s new Windows AI Studio lets developers access and configure AI models, such as Microsoft’s Phi, Meta’s Llama 2, and Mistral.

Share this story

Illustration of Microsoft’s Windows logo
Illustration by Alex Castro / The Verge

Microsoft and Nvidia want to help developers run and configure AI models on their Windows PCs. During the Microsoft Ignite event on Wednesday, Microsoft announced Windows AI Studio: a new hub where developers can access AI models and tweak them to suit their needs.

Windows AI Studio allows developers to access development tools and models from the existing Azure AI Studio and other services like Hugging Face. It also offers an end-to-end “guided workspace setup” with model configuration UI and walkthroughs to fine-tune various small language models (SLMs), such as Microsoft’s Phi, Meta’s Llama 2, and Mistral.

Image: Microsoft

Windows AI Studio lets developers test the performance of their models using Prompt Flow and Gradio templates as well. Microsoft says it’s going to roll out Windows AI Studio as a Visual Studio Code extension in the “coming weeks.”

Nvidia, similarly, revealed updates to TensorRT-LLM, which the company initially launched for Windows as a way to run large language models (LLMs) more efficiently on H100 GPUs. However, this latest update brings TensorRT-LLM to PCs powered by GeForce RTX 30 and 40 Series GPUs with 8GB of RAM or more.

Additionally, Nvidia will soon make its TensorRT-LLM compatible with OpenAI’s Chat API through a new wrapper. This will allow developers to run LLMs locally on their PCs, which is ideal for those who are concerned about storing private data in the cloud. Nvidia says its next TensorRT-LLM 6.0 release will add up to five times faster inference, as well as support for the new Mistral 7B and Nemotron-3 8B models. 

This is all a part of Microsoft’s goal to create a “hybrid loop” development pattern, which is supposed to enable AI development across the cloud and locally on devices. With this concept, developers don’t have to rely solely on their own systems to power AI development, as they can access Microsoft’s cloud servers to take the brunt of the load off their devices.