You don't need a REST API to run AI. With ONNX Runtime and Microsoft.Extensions.AI, you can run models directly inside your C# process.
ONNX (Open Neural Network Exchange) is a universal format for AI models. It allows you to take a model trained in Python/PyTorch and run it in a C# app with high performance. It uses specialized hardware accelerators like **CUDA** (Nvidia) or **DirectML** (Windows) to run incredibly fast.
This is the new "One-liner" library for running LLMs in C#.
using var model = new Model("phi-3-mini-onnx");
using var tokenizer = new Tokenizer(model);
var generator = new Generator(model, tokenizer);
// Generate text locally!
Q: "What are the hardware requirements for running a local LLM?"
Architect Answer: "The most important factor is **VRAM (Video RAM)** on the GPU. A 7B parameter model (quantized) needs about 5-6GB of RAM. If the model fits in VRAM, it runs instantly. If it overflows into system RAM, it becomes 10x slower. For a professional AI workstation, we recommend at least 16GB of VRAM (RTX 4080 or better) to run modern SLMs comfortably."