Tutorials AI & LLM Engineering for .NET Architects
Edge AI: Deploying models to local devices and private clouds
On this page
AI at the Edge
The cloud is not everywhere. Edge AI is about bringing the power of the LLM to the device where the data is born—whether that's a phone, a factory sensor, or a private server in a hospital.
1. The "Cloud-Cloud" vs "Edge-Cloud" Hybrid
Modern architecture uses a hybrid approach:
- Edge: Performs sensitive data filtering, PII removal, and basic intent detection.
- Cloud: Only receives the 'Clean' data for high-end reasoning if the Edge can't handle it.
2. Private AI Clusters
Enterprises are building "Local AI Clusters" using tools like **Ollama** or **vLLM** hosted in their own Kubernetes clusters. This gives them a private GPT endpoint that their internal developers can use without the data ever touching the public internet.
4. Interview Mastery
Q: "What is 'Latency Sensitive' AI?"
Architect Answer: "Latency-sensitive AI is where a 1-second delay is unacceptable (e.g., self-driving cars or real-time translation). For these, we must use **In-Process Inference**. We compile the ONNX model directly into our C# binary. By avoiding the 'Network Roundtrip' to a cloud API, we reduce the response time from 1,500ms to <20ms."